comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
51.25k stars 5.39k forks source link

Color fringe artifacts on whole image when inpainting->save->load->inpaint->etc multiple times in a row #1841

Open RandomGitUser321 opened 11 months ago

RandomGitUser321 commented 11 months ago

Even though I'm using masking, the whole image will progressively get more blown out with color fringing/chromatic aberration looking artifacts.

Here's the basic gist of my workflow: Make something Save it to png Load it into an image node Mask something off Sample to change masked area until I get what I want Save it again Load that new save into an image node Mask off new regions etc etc

In terms of what I'm using, I'm just jusing the base SDXL 1.0 base, sometimes the SDXL 1.0 refiner(usually not) and the SDXL 1.0 vae. I've tested the issue with regular masking->vae encode->set latent noise mask->sample and I've also tested it with the load unet SDXL inpainting 0.1 model->mask->vae encode for inpainting-sample. In terms of samplers, I'm just using dpm++ 2m karras and usually around 25-32 samples, but that shouldn't be causing the rest of the unmasked image to be touched anyways.

After a few or so round-trips of that, the rest of the untouched image will progressively get worse and worse. I'm thinking that maybe it's a floating point rounding issue that keeps rounding upward? Or maybe it has something to do with the vae encode->vae decode portion like a precision/rounding issue in there? I can partially mitigate it by using something like Ultimate SD Upscale with a low denoise value like 0.25 or 0.3, but I'd rather not have to do that every single loop through.

Any ideas?

jn-jairo commented 11 months ago

For masked image to image you need to use the ImageCompositeMasked node after to filter the masked area, I don't know what, but something alter a little bit the parts not masked, for one generation it isn't that noticeable but if you do multiple generations you will see it.

I noticed it on my workflow for upscaled inpaint of masked areas, without the ImageCompositeMasked there is a clear seam on the upscaled square, showing that the whole square image was altered, not just the masked area, but adding the ImageCompositeMasked solved the problem, making a seamless inpaint.

RandomGitUser321 commented 11 months ago

ImageCompositeMasked node

Thanks, I'll definitely try! Sounds like it's just overlaying the original image over top of the masked resample output minus the masked portion, which would definitely work as a crutch. But that still sounds like a bandaid fix for something else going on under the hood with precision or rounding.

jn-jairo commented 11 months ago

ImageCompositeMasked node

Thanks, I'll definitely try! Sounds like it's just overlaying the original image over top of the masked resample output minus the masked portion, which would definitely work as a crutch. But that still sounds like a bandaid fix for something else going on under the hood with precision or rounding.

My guess is the VAE, if you just Load Image > VAE Encode > VAE Decode > Save Image the saved image is different from the loaded one.

Input

the-mona-lisa

Output

ComfyUI_temp_lpczn_00004_

Difference

the-mona-lisa-difference

RandomGitUser321 commented 11 months ago

Yep, sounds like some rounding/precision issue in there then. At any rate, I'll just work around it for now. I'm going to try testing it in A1111 to see if this issue is present there as well. Maybe it's just some kind of limitation of the whole vae encode->latent space->decode system. I don't know enough about the whole process though to really guess anything beyond that though.

jn-jairo commented 11 months ago

Yep, sounds like some rounding/precision issue in there then. At any rate, I'll just work around it for now. I'm going to try testing it in A1111 to see if this issue is present there as well. Maybe it's just some kind of limitation of the whole vae encode->latent space->decode system. I don't know enough about the whole process though to really guess anything beyond that though.

I looked at the A1111 code and it does a process similar to the ImageCompositeMasked to get the output image when inpainting, so I guess that is how SD works and the ImageCompositeMasked is mandatory for a good result.

RandomGitUser321 commented 11 months ago

The biggest downside to imagecompositemask that I'm seeing though is the hard seam. But I guess that can be mitigated by using an extension like ComfyI2I that massively expands on the built in masking (you can set the softness and opacity of the brush) or by manually doing them in an external program like Photoshop.

But I suppose I'll just have to get used to this workflow more because I'm tired of blowing my images out with rainbow effects. If the composite seams are ugly, UltimateSD can likely fix them with a really low denoise amount. Probably less than I use to fix the rainbow issue. I'll just have to periodically use it to supersample the image up and back down to 1024 while I work on it.

jn-jairo commented 11 months ago

I blur the mask by converting it to image and using the ImageBlur node and convert back to mask, no hard seams with this process, like this:

image

I also made an ImageCrop node to crop by mask (I copied it from A1111), and a ImageUncrop (it's just the ImageCompositeMasked with different parameters for the area), so it's faster to inpaint a small area and recombine it.

image

RandomGitUser321 commented 11 months ago

I blur the mask by converting it to image and using the ImageBlur node and convert back to mask, no hard seams with this process, like this:

Yep, I was just about to say I dug that one of the node menu.

Oh one last thing, after digging around a bunch I stumbled across this article showcasing exactly what I'm talking about: https://hforsten.com/identifying-stable-diffusion-xl-10-images-from-vae-artifacts.html

Guess that yeah the artifacts come from being compressed down into the latent 256x256 space and then scaled back to 1024x1024. They are just far more noticible in 1.0 compared to even 0.9. In the eye painting example, you can see just how bad they are. I'm pretty new at all of this, but I had been under some false understanding that using the masking meant it only put that portion into latent space, instead of the whole image, but no matter what, the whole image goes into there. So when you're 5-10 maskings into your workflow loop, you've tortured your image that many compression->decompressions. This is why even if you just do image->encode->decode with nothing done to it, you get differences.

I guess that's a problem solved then and is just a limitation of the model, not the program.

jn-jairo commented 11 months ago

Guess that yeah the artifacts come from being compressed down into the latent 256x256 space and then scaled back to 1024x1024

Yeah, you have to lose something when compressing that much, the latent is even smaller because it divides by 8 so a 512x512 pixels image converts to a 64x64 in the latent space, it is a huge compression, and that makes the SD fast.

ancillarymagnet commented 11 months ago

You can fix this issue by using: VAE Encode (not the inpaint one) -> Set Latent Noise Mask

image

RandomGitUser321 commented 11 months ago

You can fix this issue by using: VAE Encode (not the inpaint one) -> Set Latent Noise Mask

No, it doesn't work and I already brought up trying that in my first post. This artifacting is just a limitation of how SDXL1.0 works. Scroll up a couple posts and read the link I posted there. Something changed between 0.9 and 1.0 and it made the issue worse, so much to the point that they can accurately detect if an image was made with SDXL1.0 96% of the time due to the VAE artifacts.

So after even a few inpainting sessions, you'll end up with a very messed up image. The only solution, for now, is to overlay your previous image over top of the inpainted one minus the masked portion. Even though it can lead into seams or visible blends, it beats the VAE noise and you can always run it through an upscaler at like 0.25-0.3 denoise to fix the seams.