dangeng / visual_anagrams

Code for the paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models"
MIT License
870 stars 81 forks source link

Higher resolution output #8

Closed unspokenlanguage closed 11 months ago

unspokenlanguage commented 11 months ago

256px seems to be the maximum size of an image. Is there any argument that we can pass for higher resolution output?

dangeng commented 11 months ago

Hello! We only use the first two stages of DeepFloyd IF, which goes up to 256x256. DeepFloyd IF also has a third stage (which is just the stable diffusion upsampler), which makes 1024x1024 images. You could try incorporating that, but you might not get great results because it's a latent diffusion model (our method works better for pixel diffusion models. You can check the paper for details). The only other pixel diffusion model that I'm aware of is Imagen, which makes images up to size 1024x1024. Unfortunately Imagen is not available publicly. If there are other pixel-based diffusion models, I would expect them to work with our method (if you know of any I would love to know as well!).