Jack000 / glid-3-xl-stable

stable diffusion training
MIT License
290 stars 36 forks source link

Color shifting & slight changes to unmasked area when outpainting #18

Open monophthongal opened 1 year ago

monophthongal commented 1 year ago

Seeing some strange shifts to the unmasked (not supposed to be edited) region of images when outpainting:

color-shift

I started with the left image (512x512), and extended to the right with a mask preserving the original image. However, the preserved section is changed, as you can see in the image above. I repeated this to extend further right, and the same thing happened. Each time the image seems to get a little darker, and on close inspection, the fine details seem sharper.

The image being passed into do_run() is unchanged from the original (I saved a copy just before inference to be sure).

Any ideas how to fix this?

Jack000 commented 1 year ago

could you upload the initial image and the command you used? Are the seams present in the outpainted image, or just when comparing against the original image?

monophthongal commented 1 year ago

Yes of course - here is another example with images saved at each step. Note that instead of using the repo's outpainting code, I'm generating shifted images and masks and then calling the model programatically.

Original image generated with standard Stable Diffusion:

example-original

Next, I create a new image that shifts to 50% to the right, and create a mask so the left half (containing part of the original image) is preserved, and the right half is masked for inpainting.

example-mask

Left side (white) will be original image, right side (black) is the region to be outpainted.

Leaving the "extended" / masked area of the image blank seems to lead to some huge seams along the edge. I've been experimenting with filling the empty area with a mirrored version of the image, along with other ideas for filling the space, and it helps a lot.

With mirroring, my input image becomes:

example-init

(the white lines on the bottom/right are a bug in my mirroring code but don't seem to actually impact anything)

Running the model with the mirrored image and mask from above outputs this:

example-output

You can tell where the mask edge was, but that's not a big deal. (Though if you have thoughts on how to fix that too I'd be interested...) Most of the time this actually works better and the mask edge is not visible.

The output image looks good on its own, however, if I overlay part of the input image, you can tell that the whole image has been slightly changed, including the unmasked left half:

example-side-by-side

The left quarter is overlayed from the input image, and the rest is the output image.

Jack000 commented 1 year ago

ok I see. The first thing is that the model doesn't like hard edges - it sees the edge and thinks it's the beginning of a separate frame. I usually just roughly paint over the straight edge of the mask and it works a lot better. I think this means the model is undertrained, because the training procedure does expose the model to hard edges like this.

for repeated inpaint/outpaints I'd try using the output_npy/00000.npy files instead of the output/00000.png images. The npy files contain the latent diffusion codes so it won't suffer re-compression when you encode the image.

The latent diffusion VAE is a form of lossy compression so if you put the output pngs back through it'll slowly accumulate VAE artifacts and color cast. Using the .npy files will still incur one instance of VAE compression though, if you're staring with a png.

The VAE throws out a lot of color information it deems to be unimportant perceptually. If you encode then decode any image with this VAE there's almost always a color cast. edit - the SD/inpaint model could also add some color cast, but I haven't tested this extensively.

monophthongal commented 1 year ago

Ah OK, that makes sense. I wondered if it might be some kind of image compression but didn't know the VAE was responsible. For now I'll explore using the npy files and fixing things up with postprocessing.

Thanks for the help!