Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.44k stars 147 forks source link

Batch Size Ablations #91

Closed fan23j closed 1 week ago

fan23j commented 2 weeks ago

Hi,

Thank you for your work and well-organized repo. Reading through the paper, I was unable to locate ablations on effect of batch size (or effective batch size) on the generation performance. Could you provide any insight into how batch size affects the quality of video generation? In particular if using effective batch size through gradient accumulation steps, would you increase the total training iters to compensate?

Intuitively, it would be obvious that a higher batch size correlates with better performance (as shown through the efficacy of image-video joint training), but I was curious whether the benefits tapered off at all with the specific model size since the whole pipeline is relatively expensive to train, especially if we have to scale for gradient accumulation steps.

Thanks.

maxin-cn commented 1 week ago

Hi,

Thank you for your work and well-organized repo. Reading through the paper, I was unable to locate ablations on effect of batch size (or effective batch size) on the generation performance. Could you provide any insight into how batch size affects the quality of video generation? In particular if using effective batch size through gradient accumulation steps, would you increase the total training iters to compensate?

Intuitively, it would be obvious that a higher batch size correlates with better performance (as shown through the efficacy of image-video joint training), but I was curious whether the benefits tapered off at all with the specific model size since the whole pipeline is relatively expensive to train, especially if we have to scale for gradient accumulation steps.

Thanks.

Thanks for your interest. I also think a larger batch size leads to better performance. But in my experience so far, using gradient accumulative does not provide significant gains for text-to-video tasks.