AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
139.45k stars 26.44k forks source link

[Feature Request]: PatchMatch init mode for inpainting / outpainting #4681

Open BlinkDL opened 1 year ago

BlinkDL commented 1 year ago

Is there an existing issue for this?

What would your feature do ?

Support https://github.com/vacancy/PyPatchMatch init mode for inpainting / outpainting which gives much better results than "fill".

Proposed workflow

Add "PatchMatch" init mode for inpainting / outpainting, as in https://github.com/lkwq007/stablediffusion-infinity

Additional information

No response

Lalimec commented 1 year ago

That would be great to use with outpainting tools.

agustincaniglia commented 1 year ago

yes pleaseee. we need this for great outpaintings.

Ladypoly commented 1 year ago

That would be awesome to have an outpointing quality like in infinity.

parlance-zz commented 1 year ago

perlin noise LoL

Ehplodor commented 1 year ago

perlin noise LoL

@parlance-zz I suppose you know a lot about the subject, after working on fourier shaped noise out-painting, reimplemented as outpainting-mk2 script in A1111. Could you please elaborate a bit (in a few words) on why perlin noise would be a bad idea for inpainting / outpainting initialization ? TY in advance

parlance-zz commented 1 year ago

perlin noise LoL

@parlance-zz I suppose you know a lot about the subject, after working on fourier shaped noise out-painting, reimplemented as outpainting-mk2 script in A1111. Could you please elaborate a bit (in a few words) on why perlin noise would be a bad idea for inpainting / outpainting initialization ? TY in advance

Sorry for the snark...

Perlin noise is mathematically equivalent to 1/f noise in the frequency spectrum, but the problem is that although 1/f-esque distributions are common natural image distributions, in reality every image will have their own distribution of feature scales and orientations that won't necessarily conform to hard-coded 1/f noise.

The FFT can be used to produce perlin noise, as shown here, but once you've taken an FFT of the source image you have much richer information to perfectly tailor the resulting noise distribution to the source image. Images that aren't photos will especially have a very different distribution in the frequency spectrum than a "natural" image or photograph.

The closer the noise distribution matches the original source image, the less work the U-net denoiser has to perform to shape the noise into a final image. If you were to actually view the noised image using both techniques, you would notice that if you zoom out and squint the FFT shaped noise almost looks like an actual image already, before SD comes into the picture at all.

Lastly, because the latent space encoding used in latent diffusion models preserves locality (a very important property for the U-Net convolutional network), these same techniques can be applied in latent space rather than image space, and can be further enhanced by integration into the noise schedule to add more shaped noise over the sampling steps (which is what we do in the sdgrpcserver backend being used in g-diffuser), possibly in combination with regular gaussian noise.

The results of using the above techniques, (in synergistic combination with CLIP guidance and the runwayML1.5 enhanced in-painting pipeline) can be seen here: https://www.g-diffuser.com

parlance-zz commented 1 year ago

I'll leave this here to hopefully prove my point.

https://www.youtube.com/watch?v=H48MVYx0j_s https://www.youtube.com/watch?v=ha8Drgj0DMs

No sample selection is used in these btw, one generated sample per keyframe, no user invention or cherry-picking, or editing. The videos are generated start to finish by pure automated (Mk.3) out-painting. :)

aleksusklim commented 1 year ago

Sorry, can anybody explain what's going on here?

PatchMatch in a nutshell is just a fancy "clone brush" that is using parts of the same (or another) image to reconstruct composition and fill holes, preserving user-defined lines and regions. Am I right?

It's not an inpainting noise generator, it is an inpainting method on its own. Is this issue proposes adding it as a "tool", so the users could patchmatch holes and then mask/inpaint seams (or the entire region) with StableDiffusion?

Now, how Perlin Noise is even relevant here? Is it used in g-diffuser.com? Is it default in outpainting-mk2?

Or, is there Fourier noise instead of Perlin (in g-diffuser or outpainting-mk2)? If so, is it better or worse? (I didn't understand it simply and clearly) If one is better than another, then why not to implement it right away? Or implement both and add a choice for outpainting noise generation method?

And if it is what is really proposed here – to be able to make in WebUI such wonderful videos as two examples above, adopting technique from g-diffuser – then, why this is discussed under PatchMatch issue!?

Shouldn't it be another dedicated thread for Perlin/FFT (or whatever is used, I'm confused already), since it is not related to PatchMatch in any way?

P.S. In https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/3581#issuecomment-1290031339 I showed that results of outpainting-mk2 are highly depend on Mask blur slider. While some people say that the only thing it is doing is Gaussian-bluring the mask image and nothing more…