Closed Ytang520 closed 1 year ago
I apologize for my late response. Based on my recollection, we trained our model using the NVIDIA RTX 3090 GPU with the CUDA 11.x version. We were able to achieve a relatively fast training time of approximately 2-3 hours per session. I am not sure what might have caused the slow training in your case, but I would recommend trying to use Tensorflow-gpu version 2.4.0 with CUDA 11.0 and CuDNN 8. Alternatively, you could also try the latest versions of Tensorflow-gpu, CUDA, and CuDNN.
I've installed the libraries mentioned in the requirement. But when I run the train.py, the speed is too slow to endure. Specifically, it needs around 60s to finish an epoch, and it requires >12h to complete one training session. I check the GPU and find that the GPU-util is around 20% which is relatively stable. (Fig. below)
I'm wondering whether you had experienced this, and if so, how do you cope with it? If not, could you tell me the specific requirements for train.py, etc. the version of Cuda, Cudnn, the num of GPU, CPU, and any other specific computer configuration needed?
Thank you!