Open YOUNG-WLI opened 1 year ago
I encountered the same issue because the default installation had the CUDA version "cu101" which was incompatible. You can reinstall with CUDA version "cu111" using the following command:
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
I tried VQ-VAE training with the parameters suggested in the document (However, since we are building the humanML3D dataset, we used the '--dataname kit' option.). However, 'RuntimeError: Unable to find a valid cuDNN algorithm to run convolution' occurred. This is known as the error that occurs when running out of vram. This error occurred even though the GPU used for training was an RTX A6000 with 48GB of VRAM, which is larger than the suggested 32GB of VRAM. What's even more puzzling is that the same error occurs even when the batch size is drastically lowered to 1. The detailed error message is as follows: