Closed CodeHatchling closed 9 months ago
I've made some progress on implementing this myself.
The following images were generated with a mask with varying blur levels applied:
Original | Masked, 64 blur | Masked, 48 blur | Masked, 32 blur | Masked, 4 blur |
---|---|---|---|---|
The masking process, at each step, interpolates the latent blocks to the original image by some amount based on the mask opacity and denoising step size.
While this works okay, it appears that the denoiser gets a bad estimate of the noise level present in the image at partially masked pixels: Pixels that are partially masked are less noisy, but they are treated as having the same noise level as fully masked pixels, which can lead to the transition areas appearing oversmoothed. I think this is because normally the unmasked pixels are not given any noise.
I think I might be able to improve this now that I know what kind of image the denoiser is expecting. My next idea is:
I basically solved this problem completely ::}
When can I push? ;;}
Another example ::}
https://github.com/CodeHatchling/stable-diffusion-webui-soft-inpainting
Code cleaned up and forked over to here
Great work on this!
When can I push? ;;}
Feel free to make a PR whenever you think this is ready. You may need to account for merge conflicts since your fork was based on the master branch and not the dev branch though.
Is there an existing issue for this?
What would your feature do ?
Problem to solve
It appears that the denoiser only considers a binary mask (with a hard boundary) with respect to what pixels should be denoised, even with extreme blurring values. Specifically, only if the mask/sketch opacity is greater than 50% does the region under that pixel get denoised. The resulting image and the original image are simply alpha-blended together using the mask opacity values.
Why this is a problem
What possibilities solving it brings
Proposed solution
Interpret the mask opacity as a per-pixel multiplier for the denoising strength. AFAIK there are a few ways one could achieve this effect:
I believe either of these would allow inpainting objects with partial opacity or very gradual transitions, where content in a transition region is preserved.
Alternate solution: dithering
A simpler option could be to use dithering to decide whether a given pixel/block is masked. In other words, using some kind of dithering pattern (Beyer, blue noise, Floyd–Steinberg) the mask opacity represents a probability a given element of the image is affected by the denoiser.
Alternate solution: adjust mask threshold
An even simpler solution could be to change the mask opacity threshold at which denoising occurs from >=50% to >0%. In other words, if the mask has opacity greater than 0, it is included in the denoising. Then, the original content could be blended over-top to completely hide the seam at the point where the mask has 0 opacity.
However, the main drawback is that ghosting artifacts will appear where both the original and modified image are visible. (Though this is an issue with the current implementation anyway.)
Proposed workflow