Closed shiyi-zh0408 closed 1 month ago
Yes. As I talked about in the 3D-UNet/temporal attention version repo, we've verified it works for higher resolution videos like 512x512 via latent diffusion. However, we are reserving that part of the code to our next project.
In addition, our Minecraft video in paper is 128x128 resolution, not 32x32 so it's not really that low
Thanks for sharing your work! It's great work! But I noticed that your video generation tasks are all performed at a very low resolution (16/32). Can your method be scaled up?