Closed zhang-haojie closed 1 month ago
Based on other people's replications, the performance of the number of iterations is related to the number of GPUs used. More details can be found in this issue.
But in theory, one GPU' train_step will be counted as one iteration. I guess the number of GPUs affects the batch of gradient backpropagation in DDP strategy.
Hi There! 👋
This issue has been marked as stale due to inactivity for 14 days.
We would like to inquire if you still have the same problem or if it has been resolved.
If you need further assistance, please feel free to respond to this comment within the next 7 days. Otherwise, the issue will be automatically closed.
We appreciate your understanding and would like to express our gratitude for your contribution to Latte. Thank you for your support. 🙏
I noticed in the previous issue that it will reach convergence after about 150,000 iters. Does this refer to the training results when the batchsize is 5?
I checked the training code and the number of iterations here is independent of the number of GPUs, right?