Closed hhaAndroid closed 3 years ago
I didn't encounter this error before as I trained all models on 2080Ti with CUDA10.2. It seems that this bug is related to the Cuda version.
BTW, have you tried to debug with CUDA_LAUNCH_BLOCKING=1
? Does it give the same cuda error?
Thank you, I will try it!
When I train on a cluster machine for 1080ti or XP(Cuda 9.0、pytorch1.5), the above error appears, but not on v100(cuda10.1、pytorch1.6) and local (cuda10.2、pytorch1.5). Do you know the reason?
I double-checked, the error is caused by the following code
When I modify it to the following code, there will be no error
But I don't know why ? Looking forward to your reply.