Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.65k stars 177 forks source link

T2V with >16 vedio_length output random noises #25

Closed jeffchy closed 7 months ago

jeffchy commented 7 months ago

As the title.

Does it mean the current t2v model is not trained on other frame lengths and cannot generalize to other frame length?

maxin-cn commented 7 months ago

As the title.

Does it mean the current t2v model is not trained on other frame lengths and cannot generalize to other frame length?

Hi, producing videos directly with more than 16 frames can lead to low-quality output. To generate videos longer than 16 frames, you might consider using the autoregressive mode for better results.

jeffchy commented 7 months ago

Thanks for your answer

luweiblues commented 7 months ago

Hi, how can I turn on the autoregressive mode?