chaidiscovery / chai-lab

Chai-1, SOTA model for biomolecular structure prediction
https://www.chaidiscovery.com
Other
1.02k stars 131 forks source link

Running into Runtime Error (cutlassF: no kernel found to launch) -> likely your GPU doesn't support bfloat16 #26

Open YidongSong opened 1 week ago

YidongSong commented 1 week ago

企业微信截图_5d10084c-5b23-489d-baa8-8eb3300dd0ab

arogozhnikov commented 1 week ago

Hi Yidong, I didn't meet this problem before.

This thread on HF suggests doing this:

torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)

Also, can you describe your setup? python version, torch version, GPU model

zdf1122 commented 1 week ago

I have occured this problem. Maybe your cuda version is older than 12.1 (check your nvidia cuda version which means the max version version you can download).

Note that the pytorch version is 2.3.1.

YidongSong commented 1 week ago

Thanks!

DieHof commented 1 week ago

I also have the same Problem. I am using CUDA 12.4.0 Interestingly I only encounter the issue when running on a RTX 2080Ti and a v100. On an A30 it runs without any errors

I also found the proposed fixes, but they did not work for me.

Hi Yidong, I didn't meet this problem before.

This thread on HF suggests doing this:

torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)

Also, can you describe your setup? python version, torch version, GPU model

arogozhnikov commented 1 week ago

we use bfloat16 format for many operations, and AFAIR 2080ti and v100 don't have those.

stianale commented 1 week ago

This is a problem in Kaggle, also. Does not seem to have to do with CUDA version installed. Anyone found a solution?

YidongSong commented 4 days ago

Can I run the code on Tesla T4, or do I need to make any changes?

stianale commented 4 days ago

Can I run the code on Tesla T4, or do I need to make any changes?

19

ESM can't fit, and no bfloat16 support, according to arogozhnikov.

I would think getting the code to run on CPU would be easier (of course, it would run slowly) than to find a suboptimal GPU that kind of works. However, I had no success thus far running on CPU. Simple substitutions of torch.device("cuda:0") with torch.device("cpu") did not help, it still tries to load the model and tensors on GPU. I suspect that because the model was trained on GPU, it must be saved/converted to CPU-compatible format somehow.