buoyancy99 / diffusion-forcing

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
Other
494 stars 19 forks source link

Have you tried any experiments with generating videos at higher resolutions? #8

Closed shiyi-zh0408 closed 1 month ago

shiyi-zh0408 commented 1 month ago

Thanks for sharing your work! It's great work! But I noticed that your video generation tasks are all performed at a very low resolution (16/32). Can your method be scaled up?

buoyancy99 commented 1 month ago

Yes. As I talked about in the 3D-UNet/temporal attention version repo, we've verified it works for higher resolution videos like 512x512 via latent diffusion. However, we are reserving that part of the code to our next project.

In addition, our Minecraft video in paper is 128x128 resolution, not 32x32 so it's not really that low