Closed Emulator000 closed 1 year ago
No clue what is going on here, I also tried it and got the same results using the weights from stabilityai/stable-diffusion-2-inpainting. I also tried the native resolution of 768x768 without luck. Spotting the json config files for the different versions, I haven't noticed anything that would obviously require some adaptation. I guess at this point the simpler would likely be to run the inpainting process on the Python and Rust side and see at which layer things start to diverge.
Ah it seems that actually one difference was that the scheduler for stable-diffusion 2.1 uses a prediction type of v_prediction
in normal generation but uses a prediction type of epsilon
for inpainting (whereas stable-diffusion 1.5 uses epsilon
for both). I've just merged a PR #63 that should hopefully help with this - at least on a single generated image it looks better now.
There is still a scheduler inconsistency as we use DDIM rather than PNDM - also using a proper DPM solver would likely help here but hopefully this doesn't make much of a difference. Please give a spin to the current github tip if you can and let us know how it gets.
I'm trying the same example for the 2.1 configuration, downloaded the appropriate CLIP, UNET and VAE and converted them correctly but it does not seems to work.
Command:
This is the output that i get with the dog/cat example that works perfectly with Stable Diffusion 1.5:
It seems that the inpainted area doensn't correctly populate.
Any possible reason for that? Should the example/code be adapted for some additional steps for the 2.1 version?