LaurentMazare / diffusers-rs

An implementation of the diffusers api in Rust
Apache License 2.0
521 stars 54 forks source link

Bad distorted picture using the in-painting example provided #58

Closed Emulator000 closed 1 year ago

Emulator000 commented 1 year ago

I noticed the exactly identical issue to this one in runwayml and I'm asking if there is something that we are missing.

I also tried to adapt the Python code from here mentioned also here but without any luck.

I just receive some weird runtime errors like:

Given groups=1, weight of size [320, 9, 3, 3], expected input[2, 4, 64, 64] to have 9 channels, but got 4 channels instead

Someone has some suggestion on how can I use the in-painting correctly?

Emulator000 commented 1 year ago

The same distortion happens with huggingface/diffusers and I could notice that the example code of this repository was originally converted from here if I'm not seeing wrong, perhaps there are some issue regarding the area outside the mask?

I also tried with a full black mask and this causes distortion and quality losses on the photo as well, especially on people faces, another mention about this issue could be found here.

I also tried updating the code with the one provided by Runwayaml without success 😢

Emulator000 commented 1 year ago

Another update here, after many research and infinite tries I just understood that is not an issue with this crate itself.

I'm using SD 1.5 and in my code I just added a post-processing step in order to mix the original untouched image within the result decoded from VAE and the original mask (not downscaled) and I get a better result.

By the way, the output from VAE also differs in saturation and brightness and a slightly difference between the inpainted area and the original image is noticeable.

I'm guessing that the encode-decode process from VAE make the image to loses their original properties.

An idea that I'll try for sure is dilating a bit the original mask in order to keep other border information from the decoded latent and then blend the luminosity of the latent and the original image (if someone know how I can achieve this in Rust would be awesome); with this trick I think that I could achieve a better result.

For what matters, to me the inpainting process with Diffusers is just not usable for processing images with people faces, the distortion is fairly aggressive.

novirusallowed commented 1 year ago

Same issue here. I have a custom model that works perfectly on Automatic111 but not when I use it with diffusers.

Even if a use a mask, it still modifies all the faces and other small details all over the image.

Emulator000 commented 1 year ago

Even if a use a mask, it still modifies all the faces and other small details all over the image.

That's a normal behavior as the VAE process the full image latent, including the area outside the mask.

In order to achieve a good result you should re-mask the original untouched area mixing with the inpainted area, blending the image and matching the new colors histogram.

I'll close this issue as is not an issue with this crate itself, actually is not an issue at all, processed image should be post-processed as Automatic111 and other tools does 😄