Open Gabo181 opened 7 months ago
Hi,
I am also using a RTX 4090, and have been struggling with this NAN loss problem. How has the DirectML plugin fared for you? Is the speed good and would you recommend it, or have you swapped to a different GPU?
I implemented the DirectML plugin and the speeds have been nice, though I am not sure if they would have been better with CUDA.
As for your question, I believe the lower use of the GPU memory is due to the original batch size of 3. I was able to get more memory use with batch size of 16 (32 would not run). However, I do not want to change the initial conditions used by the original researcher too much, so currently I am training with a batch size of 4.
Hi!
All Versions under 2.10 result in NAN Values for the loss etc. (as mentioned by another user). Therefore i am using v2.10-cpu with directML plugin, as windows native isnt supported anymore.
I noticed that my GPU (RTX 4090) is only running at 10% capacity and memory only 4gb. Is there a way to
thanks in advance