Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.69k stars 176 forks source link

About training details #110

Closed zhang-haojie closed 1 month ago

zhang-haojie commented 2 months ago

I noticed in the previous issue that it will reach convergence after about 150,000 iters. Does this refer to the training results when the batchsize is 5?

I checked the training code and the number of iterations here is independent of the number of GPUs, right?

maxin-cn commented 2 months ago

Based on other people's replications, the performance of the number of iterations is related to the number of GPUs used. More details can be found in this issue.

zhang-haojie commented 2 months ago

But in theory, one GPU' train_step will be counted as one iteration. I guess the number of GPUs affects the batch of gradient backpropagation in DDP strategy.

github-actions[bot] commented 2 months ago

Hi There! 👋

This issue has been marked as stale due to inactivity for 14 days.

We would like to inquire if you still have the same problem or if it has been resolved.

If you need further assistance, please feel free to respond to this comment within the next 7 days. Otherwise, the issue will be automatically closed.

We appreciate your understanding and would like to express our gratitude for your contribution to Latte. Thank you for your support. 🙏