Closed wang-zerui closed 1 year ago
I haven't seen this, but googling seems to suggest it's because there’s a mismatch between the CUDA version the binary was compiled to and the CUDA version of the device. Maybe the solution is to uninstall the fftconv
extension, then make sure to reinstall it with the right CUDA version.
Solved after I use another cluster with a newer driver.
current output of nvidia-smi
:
Every 1.0s: nvidia-smi Mon Aug 7 19:15:39 2023
Mon Aug 7 19:15:39 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13 Driver Version: 525.60.13 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
Previous output:
Every 1.0s: nvidia-smi Mon Aug 7 19:19:25 2023
Mon Aug 7 19:19:25 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4 |
I run into this error when I train the model with
use_fast_fftconv
.This error actually doesn't stop the training process, but the result of the conv op is wrong. I also run
PYTHONPATH=$(pwd) pytest tests/
I have installed the fftconv by running