crowsonkb / v-diffusion-pytorch

v objective diffusion inference code for PyTorch.
MIT License
715 stars 108 forks source link

Higher resolution `cc12m_1_cfg` model #10

Open njbbaer opened 2 years ago

njbbaer commented 2 years ago

First off, I'd like to say the new cc12m_1_cfg model is amazing and thank you for the work you're doing.

Are there any plans to release a 512x512 version of it? I know it's possible to output images at any size, but it's clear they look best at the native 256x256 resolution. While sometimes very beautiful in their own way, higher resolutions tend to repeat patterns and multiple generations of the prompt do not look as unique.

crowsonkb commented 2 years ago

I would need to heavily filter the dataset to exclude images that are smaller than 512px on the short edge, so probably not. However I am thinking about trying for 512x512 or larger with a LAION model later, because the dataset is so huge that filtering will still leave me with a sufficiently large dataset.

njbbaer commented 2 years ago

Good to know. I hope you do!

I wonder if it would be possible to start the model at 256x256 and run the output through a second pass at a higher resolution.

njbbaer commented 2 years ago

@crowsonkb I'm getting good results with this. I run the model through once, then upscale and feed it into reverse_sample to run backwards, and finally take that to run forwards again with the same prompt. The image comes out looking slightly different, but higher quality and preserving the major features of the original.

crowsonkb commented 2 years ago

Ohh. I have been experimenting with scaling up then re-noising the image and doing forward sampling starting from there (i.e. using it as an init image) and that has been working for me. I'm surprised reverse then forward sampling isn't preserving the upscale blur/artifacts though... are you doing unconditional reverse sampling then forward using a text condition, or some such?

njbbaer commented 2 years ago

Ohh. I have been experimenting with scaling up then re-noising the image and doing forward sampling starting from there (i.e. using it as an init image) and that has been working for me.

Does that work? I tried something like it at first, but the images were either too blurry or too dissimilar from the original. I might have done something wrong though. Can you share how you're re-noising the image?

I'm surprised reverse then forward sampling isn't preserving the upscale blur/artifacts though... are you doing unconditional reverse sampling then forward using a text condition, or some such?

Yeah that's exactly what I'm been doing. max_timesteps balances blurriness with losing detail from the original image. The example below was done with max_timesteps=0.8. It doesn't always work though and some images come out looking worse.

Bilinear interpolation

original

Diffusion upscaled

0 8