VAE encode/decode messing up faces

aaron9412 commented 1 year ago

Hi, comfyui is awesome!!

I'm having a problem where any time the VAE recognizes a face, it gets distorted. I can build a simple workflow (loadvae, vaedecode, vaeencode, previewimage) with an input image. The output image will have a distorted face with no other distortions. I'm not sure what to do about this. I have tried mse, ema, and anything, and they all produce the result. Automatic1111 does not do this in img2img or inpainting, so I assume its something going on in comfy. Windows10, latest comfy (downloaded 16MAY2023, idk where to find version number in comfy), impact and was nodes installed. Any suggestions appreciated. Maybe there is a way to use a null VAE?

Have a great day! -Aaron

ltdrdata commented 1 year ago

If you attach your workflow. It might be helpful.

WASasquatch commented 1 year ago

Are you using models vae and not override? Most people override in A1111 cause model vaes are often broken. Try using custom vaes with models.

DivinoAG commented 1 year ago

Are you using models vae and not override? Most people override in A1111 cause model vaes are often broken. Try using custom vaes with models.

That seems to be the case for me, I tried many models that say they include embedded VAEs, and the results are almost universally worse than using the standard 1.5 VAE (vae-ft-mse-840000-ema-pruned). This might be worth checking.

aaron9412 commented 1 year ago

I am using VAE loader with the common VAE models I think. Please see my example below of comfy having an issue with a face but A111 does not -

Comfy Workflow: Input face: Output face:

Same picture, same VAE in inpaint in A1111: Input face: Output face:

The changes in the A111 face are from my hackjob inpaint I think, A1111 still has the same mouth, mustache, eyes, comfy has a messed up mustache and mouth with no other input except VAE encode/decode.

Hope this helps!

aaron9412 commented 1 year ago

Here's the input image since it's a bit hard to replicate, it only happens to certain images.

WASasquatch commented 1 year ago

This looks like it is related to issue I mentioned in the chat. The vae converted to raw images have a dither or something applied which messes with all the feature lines of a image. This would cause issues in the decode process likely.

What do you think @comfyanonymous ? It looks like the issues are related to ghosting/halo around feature like his nose lines and mouth lines.

To me, I see the same halo issue around his noise and mouth and stuff that has no been mitigated through decode process as "features" of the original image.

ltdrdata commented 1 year ago

I found that VAEEncodeForInpaint's weired action. VAEEncode + SetLatentNoise is working well.

I thought the difference of VAEEncodeForInpaint and VAEEncode + SetLatentNoise is whether cropping for efficiency. But it is not. There is another side effect.

Ferniclestix commented 1 year ago

Latest update here, using WAS, Efficiency and Quality of life nodes. although this bug is not because of those custom nodes. I get similar vae issues, vaces are almost always messed up but in addition. Vae encode and decode causes artifacts to appear. vaebug2 Before and after vae encoding.

vaebug1 after 5 steps of sampler and additional vae encode.

Theres definately something going on with vae process, it tends to appear in darker images or areas with a high contrast sharp edge, Im assuming this is not normally spotted because of denoise process wiping it out but if you are just strait up working with image nodes and very low step sampling you end up with artifacts.

Can supply workflow if needed but custom nodes will bork. vaebug3 heres an image of it. tested with overide Vae 'vae-ft-mse-840000-ema-pruned' and embedded vae

Edit: further testing with multiple models and clean workflows from scratch with no modded nodes and multiple VAE sources. Edge artifacts found to be repeatable under these circumstances with reliability.

use an image with dark and light gradient then vae encode, a small discoloration will appear usually at right and bottom edges. In some cases this may not work perfectly every time the background image seems to have some bearing on the likelyhood of occurance, darker seems to be better to get this to trigger.
Setting a sampler denoising to 1 anywhere along the workflow fixes subsequent nodes and stops this distortion happening, however repeated samplers one after another can eventually bring this edge artifacting back into the image.
setting sampler denoising to 0.2 on an initial desampler will almost guarantee the edge artifact to appear.
using an empty latent image greatly reduces the chances of the artifacts appearing.
tests using faces had very similar results so are likely linked
Experiments could eailsy be tied to my specific system as I cannot easily test it on another computer. other stable diffusion UI such as automatic and easy diffusion do not suffer this issue.

WASasquatch commented 1 year ago

Latest update here, using WAS, Efficiency and Quality of life nodes. although this bug is not because of those custom nodes. I get similar vae issues, vaces are almost always messed up but in addition. Vae encode and decode causes artifacts to appear. Before and after vae encoding.

after 5 steps of sampler and additional vae encode.

Theres definately something going on with vae process, it tends to appear in darker images or areas with a high contrast sharp edge, Im assuming this is not normally spotted because of denoise process wiping it out but if you are just strait up working with image nodes and very low step sampling you end up with artifacts.

Can supply workflow if needed but custom nodes will bork. heres an image of it. tested with overide Vae 'vae-ft-mse-840000-ema-pruned' and embedded vae

Edit: further testing with multiple models and clean workflows from scratch with no modded nodes and multiple VAE sources. Edge artifacts found to be repeatable under these circumstances with reliability.
* use an image with dark and light gradient then vae encode, a small discoloration will appear usually at right and bottom edges. In some cases this may not work perfectly every time the background image seems to have some bearing on the likelyhood of occurance, darker seems to be better to get this to trigger.

* Setting a sampler denoising to 1 anywhere along the workflow fixes subsequent nodes and stops this distortion happening, however repeated samplers one after another can eventually bring this edge artifacting back into the image.

* setting sampler denoising to 0.2 on an initial desampler will almost guarantee the edge artifact to appear.

* using an empty latent image greatly reduces the chances of the artifacts appearing.

* tests using faces had very similar results so are likely linked
Experiments could eailsy be tied to my specific system as I cannot easily test it on another computer. other stable diffusion UI such as automatic and easy diffusion do not suffer this issue.

Nice work digging into what's actually happening. The VAE Encoding process is assumed lossy, but it does appear it could use some major improvement.

Also Kudos on the Nausicaä Valley of the Wind init. Haha Though I prefer the bastardized 1988 (I think) US release "Warriors of the Wind" myself. Just cause I grew up on it and so familiar with it.

Ferniclestix commented 1 year ago

not something I think most would notice using like a straight img2img of bright images or just full noise on first sampler. Yeh, judging from how latent scaling works its pretty lossy. Hopefully its as simple as cropping in 2 pixels or something but i doubt it. also, lol, making pictures of the fungus forests from that movie. but i looove dark images which is how i keep stumbling into this vae issue :<

WASasquatch commented 1 year ago

not something I think most would notice using like a straight img2img of bright images or just full noise on first sampler. Yeh, judging from how latent scaling works its pretty lossy. Hopefully its as simple as cropping in 2 pixels or something but i doubt it. also, lol, making pictures of the fungus forests from that movie. but i looove dark images which is how i keep stumbling into this vae issue :<

I feel like this explains the random borders that are hard to get rid of some times. That just creep up out of no where. it's not like you're all "Give me a picture frame". I think it just interprets the edges pixel issue and does a border.

Also awesome! I tried doing the toxic jungle too! Though that was back in Disco Diffusion, and it was surprising how well it understood the prompt. Can tell it was picking up on resources around the film. I should try again in SD. Thanks for reminding me

final wip

morphles commented 1 year ago

Guys, I'm not sure I get exactly what you mean, but I think it might be same stuff I'm seeing often, with my hi rest extra custom sampling (though I was seeing it with more regular stuff, I always assumed I'm having cfg to high or maybe unlucky seed). Things like "pink/purple lens flares" or just complete sharp edged pink/purple blotch. If it's vae and not stuff from sampling than it likely lead me astray with trying to tweak settings when I did not need to... For this toxic forest, I see that similar weird color, though here it seems a bit more bluish. So I just wanted to make sure that it's this that is considered wrong?

knigitz commented 1 year ago

This needs higher priority, and a fix. Currently, loading an image and piping it through a VAE Encode/VAE Decode with any VAE input will produce artifacts in the image output. This even adds artifacts in the areas that are masked not to change during an inpaint process.

Very simple workflows have already been provided above. This issue is months old and needs proper scoping.

Here is a jeep before and after comfy vae process:

This inpaints very poorly:

WASasquatch commented 1 year ago

Look how small that text is though. How do you expect the integrity of that to hold up when it's at 64^2 noisy? I am fairly certain once encoded the image is a very small representation of the original image.

This is part of the reason that small faces/hands in full body images come out scrambled and need a HR pass.

This needs higher priority, and a fix. Currently, loading an image and piping it through a VAE Encode/VAE Decode with any VAE input will produce artifacts in the image output. This even adds artifacts in the areas that are masked not to change during an inpaint process.

Very simple workflows have already been provided above. This issue is months old and needs proper scoping.

Here is a jeep before and after comfy vae process:

This inpaints very poorly:

Lalimec commented 1 year ago

I thought this was an expected behavior in inpainting pipeline, no matter what there is always a step of noise in the original image. a1111 combines original image with the inpainted one to keep the details and such same.

alenknight commented 10 months ago

is this still a thing? was hoping would be fixed... there's quite a lot of uses where vaedecode/encode is critical and shouldn't be breaking images.

Piezoid commented 10 months ago

VAE encode and decode is an inherently lossy compression process. It's more or less accurate depending on the VAE training process.

Could it be that A1111's img2img blends the original image on top of the output in order to compensate for this?

I reproduced the VAE encode/decode using the HF diffusers library:

```python import torch from diffusers import AutoencoderKL from diffusers.image_processor import VaeImageProcessor from diffusers.utils import load_image, pt_to_pil torch.set_grad_enabled(False) vae = AutoencoderKL.from_pretrained('madebyollin/sdxl-vae-fp16-fix').to('cuda') vae_scale_factor = 2 ** (len(vae.config.block_out_channels) - 1) image_processor = VaeImageProcessor(vae_scale_factor=vae_scale_factor) img = image_processor.preprocess(load_image('/tmp/239745270-e5965c37-c53b-4e06-9bee-41b816514f68.jpeg')).to('cuda') # The sample image from @aaron9412 above vae_encode_output = vae.encode(img) vae_decode_output = vae.decode(vae_encode_output.latent_dist.mean) pt_to_pil(vae_decode_output.sample.cpu())[0] #Shows up in notebook, otherwise call .save('dest.png') ```

Left: HF diffusers, right: ComfyUI They look about the same.

WASasquatch commented 10 months ago

VAE encode and decode is an inherently lossy compression process. It's more or less accurate depending on the VAE training process.

Could it be that A1111's img2img blends the original image on top of the output in order to compensate for this?

I reproduced the VAE encode/decode using the HF diffusers library:

Left: HF diffusers, right: ComfyUI They look about the same.

ComfyUI version does strangely look more cartoons despite being more or less the same. Strange effect its giving me.

Piezoid commented 10 months ago

You're right, there is some subtle differences. A ~7x boost on the symmetric differences give this:

krita file doing the difference

nurhesen commented 3 months ago

Still an issue. Everything is messed up with images smaller than 1000px

comfyanonymous / ComfyUI

VAE encode/decode messing up faces #673