Sara-Ahmed / SiT

Self-supervised vIsion Transformer (SiT)
324 stars 49 forks source link

OSError: ... symbol free_gemm_select version libcublasLt.so.11 not defined #12

Closed mattroos closed 3 years ago

mattroos commented 3 years ago

I followed the demo instructions for training and SSL model on STL-10 verbatim. However, I get this error immediately after starting the training process.

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch-size 72 --epochs 501 --min-lr 5e-6 --lr 1e-3 --training-mode 'SSL' --data-set 'STL10' --output 'checkpoints/SSL/STL10' --validate-every 10

results in this error:

OSError: /home/mroos/miniconda3/envs/SiT/lib/python3.8/site-packages/torch/lib/../../../../libcublas.so.11: symbol free_gemm_select version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference
mattroos commented 3 years ago

I resolved this by installing CUDA 11.1 rather than 11.0.

conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c conda-forge