Loss goes to Nan from 4th Step of 1st Epoch. But the same training script works well on another computer with another graphic card. Requirement list is same for both. Both are Windows.
Works here:
NVIDIA Quadro M2200
Driver version: 471.11
CUDA Version: 11.4
Does not work here as loss goes to nan:
NVIDIA RTX A4000
Driver version: 512.59
CUDA Version: 11.6 (CUDA 11.2 is also installed but 11.6 is active)
Also before the loss goes to Nan, it takes lot of time like 30 mins to start the training which is not the case in another PC.
Loss goes to Nan from 4th Step of 1st Epoch. But the same training script works well on another computer with another graphic card. Requirement list is same for both. Both are Windows.
Works here: NVIDIA Quadro M2200 Driver version: 471.11 CUDA Version: 11.4
Does not work here as loss goes to nan: NVIDIA RTX A4000 Driver version: 512.59 CUDA Version: 11.6 (CUDA 11.2 is also installed but 11.6 is active)
Also before the loss goes to Nan, it takes lot of time like 30 mins to start the training which is not the case in another PC.
Requirements:
I feel is the NVIDIA CUDA version issue. What can go wrong ?