Closed Davido111200 closed 3 months ago
Hi,
Thank you for sharing this amazing work! I have a question regarding the GPUs used to run this code. In your paper, you mentioned that V100 GPUs were utilized. However, I encountered an error while using V100 GPUs myself.
ValueError: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0
I tried set the precision to fp16, but it results in another error. Is there a way to get around this? Thank you
Hi,
Thank you for reporting the issue. Yes, V100 GPU does not support bf16 training. This issue might be because I cleaned the code and tested it on an A6000 GPU before uploading it to Github.
To resolve this, please try commenting out the line 49 in tic/training.py where it says "bf16=True". This should help resolve the issue.
Thank you for your response. I decided to go for A100 gpus to avoid precision problem
Also, can you please specify the torch version that you use? With my current torch==2.1.0, it seems to have a lot of issues regarding torch.dynamo, thus ric training can not be run
Thank you for your response. I decided to go for A100 gpus to avoid precision problem
Also, can you please specify the torch version that you use? With my current torch==2.1.0, it seems to have a lot of issues regarding torch.dynamo, thus ric training can not be run
I recently used torch==2.0.1 on A6000 and torch==2.1.2 on A100. In addition, V100 can use fp32 by default without specifying "bf16=True" and 'fp16=True'.
It seems that your CUDA and torch may be mismatched. You can reinstall torch and test with torch.cuda.is_available().
I double-checked my CUDA compatibility, and it doesn't seem to be the problem. However, I found a way to work around the bug. Thanks for taking a look
Hi,
Thank you for sharing this amazing work! I have a question regarding the GPUs used to run this code. In your paper, you mentioned that V100 GPUs were utilized. However, I encountered an error while using V100 GPUs myself.
ValueError: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0
I tried set the precision to fp16, but it results in another error. Is there a way to get around this? Thank you