Can this really run with just 32GB vram?

I tried VQ-VAE training with the parameters suggested in the document (However, since we are building the humanML3D dataset, we used the '--dataname kit' option.). However, 'RuntimeError: Unable to find a valid cuDNN algorithm to run convolution' occurred. This is known as the error that occurs when running out of vram. This error occurred even though the GPU used for training was an RTX A6000 with 48GB of VRAM, which is larger than the suggested 32GB of VRAM. What's even more puzzling is that the same error occurs even when the batch size is drastically lowered to 1. The detailed error message is as follows:

Traceback (most recent call last):
  File "train_vq.py", line 116, in <module>
    loss.backward()
  File "/root/anaconda3/envs/T2M-GPT/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/root/anaconda3/envs/T2M-GPT/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

Mael-zys / T2M-GPT

Can this really run with just 32GB vram? #45