Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.44k stars 147 forks source link

Is autoregression possible? #71

Open zhaohm14 opened 2 months ago

zhaohm14 commented 2 months ago

Thanks for your wonderful work! I am interested in applying autoregressive to achieve a length-flexible output. Could this be implemented by changing the way the model infers, like the LLMs?

maxin-cn commented 2 months ago

Thanks for your wonderful work! I am interested in applying autoregressive to achieve a length-flexible output. Could this be implemented by changing the way the model infers, like the LLMs?

Thanks for your interest. What inference algorithm of LLM are you referring to specifically?

zhaohm14 commented 2 months ago

I mean, generating subsequent frames using the previous frames as input (and perhaps adding a special end token?), instead of generating 16 frames at once. Thus we can accept training videos with any length, and generate longer and more length-flexible videos.

maxin-cn commented 2 months ago

I mean, generating subsequent frames using the previous frames as input (and perhaps adding a special end token?), instead of generating 16 frames at once. Thus we can accept training videos with any length, and generate longer and more length-flexible videos.

Not sure about performance, since the model was trained directly on 16 frames of video. You can try it, and if there are better results, welcome PR.