Open HetagKoroev opened 2 years ago
I have the same error
I get the same error when pytorch is installed with pip, but it does work with anaconda.
It's not exactly the same issue as this, but seems related. https://github.com/pytorch/pytorch/issues/56747#issuecomment-825559343
@HetagKoroev you can try to use examples with inference on GPU 3.5Gb vRAM https://github.com/sberbank-ai/ru-dalle/pull/51
or Jupyter version:
let me known if it helps for you @HetagKoroev
@HetagKoroev you can try to use examples with inference on GPU 3.5Gb vRAM #51
or Jupyter version:
let me known if it helps for you @HetagKoroev
Same error
NVidia GTX 1060ti 4GB Zorin OS 16 (Ubuntu 20.03) Driver Version: 470.82.00 CUDA Version: 11.4
ruDALL-E batch size: 1
Total GPU RAM: 3.82 Gb
CPU: 8
RAM GB: 7.6
PyTorch version: 1.10.0+cu102
CUDA version: 10.2
cuDNN version: 7605
Allowed GPU RAM: 3.5 Gb
GPU part 0.9162
◼️ Malevich is 1.3 billion params model from the family GPT3-like, that uses Russian language and text+image multi-modality.
tokenizer --> ready
Working with z of shape (1, 256, 32, 32) = 262144 dimensions.
vae --> ready
ruclip --> ready
0%| | 1/1024 [00:04<1:23:09, 4.88s/it]/mnt/d_drive/Projects/AI/ru-dalle/rudalle/dalle/model.py:77: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
row_ids = torch.arange(past_length, input_shape[-1] + past_length,
3%|▎ | 27/1024 [00:07<04:46, 3.48it/s]
Traceback (most recent call last):
File "/mnt/d_drive/Projects/AI/ru-dalle/main.py", line 113, in <module>
codebooks += generate_codebooks(text, tokenizer, dalle, top_k=top_k, images_num=images_num, top_p=top_p,
File "/mnt/d_drive/Projects/AI/ru-dalle/main.py", line 87, in generate_codebooks
logits, has_cache = dalle(out, attention_mask,
File "/mnt/d_drive/Projects/AI/ru-dalle/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d_drive/Projects/AI/ru-dalle/rudalle/dalle/fp16.py", line 51, in forward
return fp16_to_fp32(self.module(*(fp32_to_fp16(inputs)), **kwargs))
File "/mnt/d_drive/Projects/AI/ru-dalle/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d_drive/Projects/AI/ru-dalle/rudalle/dalle/model.py", line 122, in forward
logits = self.to_logits(transformer_output)
File "/mnt/d_drive/Projects/AI/ru-dalle/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d_drive/Projects/AI/ru-dalle/venv/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/mnt/d_drive/Projects/AI/ru-dalle/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d_drive/Projects/AI/ru-dalle/venv/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "/mnt/d_drive/Projects/AI/ru-dalle/venv/lib/python3.8/site-packages/torch/nn/functional.py", line 1848, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)`
@muhammadyusuf-kurbonov Could you try to reinstall torch with version 1.7.1 cuda 10.2?
# CUDA 10.2
pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2
@muhammadyusuf-kurbonov Could you try to reinstall torch with version 1.7.1 cuda 10.2?
# CUDA 10.2 pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2
Doesn't helped! :disappointed:
Install torch with Anaconda. That solved the problem for me.
Install torch with Anaconda. That solved the problem for me.
Doesn't helped too :disappointed:
@muhammadyusuf-kurbonov Do you get the same error message CUBLAS_STATUS_EXECUTION_FAILED
with Anaconda? Or a different "out of memory" message?
/mnt/d_drive/Projects/AI/Anaconda/bin/python /mnt/d_drive/Projects/AI/ru-dalle/main.py
ruDALL-E batch size: 1
Total GPU RAM: 3.82 Gb
CPU: 8
RAM GB: 7.6
PyTorch version: 1.10.0+cu102
CUDA version: 10.2
cuDNN version: 7605
Allowed GPU RAM: 3.5 Gb
GPU part 0.9162
◼️ Malevich is 1.3 billion params model from the family GPT3-like, that uses Russian language and text+image multi-modality.
tokenizer --> ready
Working with z of shape (1, 256, 32, 32) = 262144 dimensions.
vae --> ready
ruclip --> ready
0%| | 1/1024 [00:03<1:05:06, 3.82s/it]/mnt/d_drive/Projects/AI/ru-dalle/rudalle/dalle/model.py:77: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
row_ids = torch.arange(past_length, input_shape[-1] + past_length,
3%|▎ | 27/1024 [00:06<04:08, 4.01it/s]
Traceback (most recent call last):
File "/mnt/d_drive/Projects/AI/ru-dalle/main.py", line 113, in <module>
codebooks += generate_codebooks(text, tokenizer, dalle, top_k=top_k, images_num=images_num, top_p=top_p,
File "/mnt/d_drive/Projects/AI/ru-dalle/main.py", line 87, in generate_codebooks
logits, has_cache = dalle(out, attention_mask,
File "/home/muhammadyusuf/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d_drive/Projects/AI/ru-dalle/rudalle/dalle/fp16.py", line 51, in forward
return fp16_to_fp32(self.module(*(fp32_to_fp16(inputs)), **kwargs))
File "/home/muhammadyusuf/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d_drive/Projects/AI/ru-dalle/rudalle/dalle/model.py", line 122, in forward
logits = self.to_logits(transformer_output)
File "/home/muhammadyusuf/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/muhammadyusuf/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/muhammadyusuf/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/muhammadyusuf/.local/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "/home/muhammadyusuf/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 1848, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)`
With anaconda I tried both cudatoolkits (10.2 and 11.3)
@muhammadyusuf-kurbonov
Use anaconda to set up your environment like this.
conda create --name rudalle
conda activate rudalle
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
conda install -c conda-forge transformers youtokentome omegaconf einops matplotlib psutil
pip install taming-transformers more_itertools PyWavelets
@muhammadyusuf-kurbonov
Use anaconda to set up your environment like this.
conda create --name rudalle conda activate rudalle conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch conda install -c conda-forge transformers youtokentome omegaconf einops matplotlib psutil pip install taming-transformers more_itertools PyWavelets
(rudalle) muhammadyusuf@muhammadyusuf-IdeaPad-Gaming-3-15IMH05:/mnt/d_drive/Projects/AI/ru-dalle$ python main.py
ruDALL-E batch size: 1
Total GPU RAM: 3.82 Gb
CPU: 8
RAM GB: 7.6
PyTorch version: 1.10.0
CUDA version: 11.3
cuDNN version: 8200
Allowed GPU RAM: 3.5 Gb
GPU part 0.9162
◼️ Malevich is 1.3 billion params model from the family GPT3-like, that uses Russian language and text+image multi-modality.
tokenizer --> ready
Working with z of shape (1, 256, 32, 32) = 262144 dimensions.
vae --> ready
ruclip --> ready
0%|▏ | 1/1024 [00:11<3:21:52, 11.84s/it]/mnt/d_drive/Projects/AI/ru-dalle/rudalle/dalle/model.py:77: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
row_ids = torch.arange(past_length, input_shape[-1] + past_length,
4%|████████▏ | 43/1024 [00:20<07:41, 2.13it/s]
Traceback (most recent call last):
File "/mnt/d_drive/Projects/AI/ru-dalle/main.py", line 113, in
Error has been changed :smile:
@muhammadyusuf-kurbonov You can try setting use_cache=False
in the generate_codebooks step, as suggested here: https://github.com/sberbank-ai/ru-dalle/issues/18#issuecomment-967176880
However, you should also try running it with fp16=False, use_cache=True
on device='cpu'
. For me, I can generate images 4-5x faster compared to device=cuda
without the cache.
Without cache it runs until 41% :clap: :clap: :clap:
(rudalle) muhammadyusuf@muhammadyusuf-IdeaPad-Gaming-3-15IMH05:/mnt/d_drive/Projects/AI/ru-dalle$ python main.py
ruDALL-E batch size: 1
super-resolution: False
Total GPU RAM: 3.82 Gb
CPU: 8
RAM GB: 7.6
PyTorch version: 1.10.0
CUDA version: 11.3
cuDNN version: 8200
Allowed GPU RAM: 3.5 Gb
GPU part 0.9162
◼️ Malevich is 1.3 billion params model from the family GPT3-like, that uses Russian language and text+image multi-modality.
tokenizer --> ready
Working with z of shape (1, 256, 32, 32) = 262144 dimensions.
vae --> ready
ruclip --> ready
0%| | 1/1024 [00:12<3:26:18, 12.10s/it]/mnt/d_drive/Projects/AI/ru-dalle/rudalle/dalle/model.py:77: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
row_ids = torch.arange(past_length, input_shape[-1] + past_length,
41%|████████████████▌ | 423/1024 [17:46<25:14, 2.52s/it]
Traceback (most recent call last):
File "/mnt/d_drive/Projects/AI/ru-dalle/main.py", line 115, in <module>
codebooks += generate_codebooks(text, tokenizer, dalle, top_k=top_k, images_num=images_num, top_p=top_p, bs=DALLE_BS)
File "/mnt/d_drive/Projects/AI/ru-dalle/main.py", line 90, in generate_codebooks
logits, has_cache = dalle(out, attention_mask,
File "/mnt/d_drive/Projects/AI/Anaconda/envs/rudalle/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d_drive/Projects/AI/ru-dalle/rudalle/dalle/fp16.py", line 51, in forward
return fp16_to_fp32(self.module(*(fp32_to_fp16(inputs)), **kwargs))
File "/mnt/d_drive/Projects/AI/ru-dalle/rudalle/dalle/fp16.py", line 42, in fp16_to_fp32
return conversion_helper(val, float_conversion)
File "/mnt/d_drive/Projects/AI/ru-dalle/rudalle/dalle/fp16.py", line 15, in conversion_helper
rtn = [conversion_helper(v, conversion) for v in val]
File "/mnt/d_drive/Projects/AI/ru-dalle/rudalle/dalle/fp16.py", line 15, in <listcomp>
rtn = [conversion_helper(v, conversion) for v in val]
File "/mnt/d_drive/Projects/AI/ru-dalle/rudalle/dalle/fp16.py", line 14, in conversion_helper
return conversion(val)
File "/mnt/d_drive/Projects/AI/ru-dalle/rudalle/dalle/fp16.py", line 40, in float_conversion
val = val.float()
RuntimeError: CUDA out of memory. Tried to allocate 54.00 MiB (GPU 0; 3.82 GiB total capacity; 2.53 GiB already allocated; 40.12 MiB free; 3.50 GiB allowed; 2.61 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory
It works on CPU. But difference in performance is not so big (CPU is Intel i5-10300H).
I guess maybe your gpu memory is not enough you can try run your code on one bigger gpu device or small you model
I have a GTX 1660 SUPER 6 gb vram, Ubuntu 20.04, Python 3.9, Driver Version: 460.91.03 , CUDA Version: 11.2
At this stage: 3%|███▏ | 27/1024 I get an error:
The video memory used at the time of the error: 4893MiB / 5936MiB
Also, at the very beginning of generation, I get a warning:
UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').