Open Pfaeff opened 2 years ago
Isn't that already the case ? I expected that the saturation of the mask is used as a modifier on strength.
Isn't that already the case ? I expected that the saturation of the mask is used as a modifier on strength.
No it is not. Until yesterday at least, mask drawn from within the UI is pitch black so obviously only one level of inpainting "strength". And if user provides mask using alpha channel of the image, then ANY non zero alpha value will be considered as mask.
In that case I think that's quite an important addition imho. The denoising value should be multiplied with the mask blackness if that's possible.
FYI, Already asked in https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/560
That's not the same thing, though.
Agree. That would have been a good start.
Does anyone know if there's a plugin that can already do this? Besides that, I also just want to bump this issue to the top. I think it's an essential feature but after over a year, it's still open. Or is this already implemented and it was just forgotten to close this issue?
Well, https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14208 sounds like what OP is asking for.
As the author of https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14208, I've looked into whether per-pixel denoising strength is something that could be natively supported by just passing in different data.
The current models and the sampling methods assume a constant noise level throughout the image, and there isn't a way to specify varying noise levels. This is theoretically possible, and could be achieved by training an existing checkpoint while passing in an additional channel (similar to inpainting models, which take image and mask conditioning as additional input).
I've tried simply passing in a varying noise level image to the denoiser, but this simply causes the denoiser to oversmooth the areas without noise. It seems to assume that a certain percent of the content in any area of an image must be noise, even if it isn't.
The branch I wrote tries to emulate a similar effect through interpolating the original image latents with the denoised latents at each step. The blending occurs according to a mask (which you could interpret as a per-pixel denoising strength multiplier). While the effect might not be exactly the same as true per-pixel denoising strength, it is probably as close as you can get without requiring a new network architecture.
Is your feature request related to a problem? Please describe. Creating variations of images, where some parts should'nt change quite as much, but not remain fixed as when using inpainting. This would be especially useful when using img2img for upscaling and detail enhancement.
Describe the solution you'd like An option to provide a mask by which the denoising strength will be multiplied.
Describe alternatives you've considered Inpainting, but inpainting leaves the masked region completely untouched.