Hi, could you please tell me about the total training time and total steps you used for your model? I am using the default parameter in your code which sets numsteps = 6000000, but the training on eight A100 GPUs takes nearly 10 days. Could it be possible that this code does not support multi-GPU usage?
Hi, could you please tell me about the total training time and total steps you used for your model? I am using the default parameter in your code which sets numsteps = 6000000, but the training on eight A100 GPUs takes nearly 10 days. Could it be possible that this code does not support multi-GPU usage?