ai-forever / Kandinsky-2

Kandinsky 2 — multilingual text2image latent diffusion model
Apache License 2.0
2.77k stars 307 forks source link

How to release GPU memory after catching OutOfMemory error? #88

Open WatanoK10 opened 1 year ago

WatanoK10 commented 1 year ago

Hi everyone. I have a question regarding the use of this model.

I ran tex2img with the following code and got an error

from kandinsky2 import get_kandinsky2
model = get_kandinsky2('cuda', task_type='text2img', model_version='2.1', use_flash_attention=False)
images = model.generate_text2img(
    "cat 4k", 
    num_steps=100,
    batch_size=2, 
    guidance_scale=4,
    h=400, w=600,
    sampler='p_sampler', 
    prior_cf_scale=4,
    prior_steps="5"
)

and got an error

OutOfMemoryError                          Traceback (most recent call last)
Cell In[4], line 1
----> 1 images = model.generate_text2img(
      2     "cat 4k", 
      3     num_steps=100,
      4     batch_size=2, 
      5     guidance_scale=4,
      6     h=400, w=600,
      7     sampler='p_sampler', 
      8     prior_cf_scale=4,
      9     prior_steps="5"
     10 )

File [~/src/kandinsky/lib/python3.10/site-packages/torch/utils/_contextlib.py:115], in context_decorator..decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File [~/src/kandinsky/lib/python3.10/site-packages/kandinsky2/kandinsky2_1_model.py:341], in Kandinsky2_1.generate_text2img(self, prompt, num_steps, batch_size, guidance_scale, h, w, sampler, prior_cf_scale, prior_steps, negative_prior_prompt, negative_decoder_prompt)
    338     config["diffusion_config"]["timestep_respacing"] = str(num_steps)
    339 diffusion = create_gaussian_diffusion(**config["diffusion_config"])
--> 341 return self.generate_img(
    342     prompt=prompt,
...
     66 norm_f = self.norm_layer(f)
---> 67 new_f = norm_f * self.conv_y(zq) + self.conv_b(zq)
     68 return new_f

OutOfMemoryError: CUDA out of memory. Tried to allocate 280.00 MiB (GPU 0; 7.79 GiB total capacity; 7.31 GiB already allocated; 206.69 MiB free; 7.40 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

At this moment, GPU memory usage was 6890 MiB / 8192 MiB as indicated by nvidia-smi

$ nvidia-smi
Wed Aug  2 23:11:21 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 Ti     Off | 00000000:07:00.0 Off |                  N/A |
|  0%   35C    P8              10W / 200W |   6890MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   1435481      C                            ~/bin/python3     6884MiB |
+---------------------------------------------------------------------------------------+

After deleting the model and images variables, I get the same error when I redefine the model. nvidia-smi shows no change in GPU memory usage. It seems that GPU memory was not released even after deleting the variables. Therefore, it was necessary to restart the jupyter instance to re-run. If you know how to release and re-allocate GPU memory without restarting the instance, please let us know.

Finally, I want to express my gratitude to the development team and everyone else for their hard work and dedication.

Kind regards.