LaurentMazare / diffusers-rs

An implementation of the diffusers api in Rust
Apache License 2.0
535 stars 55 forks source link

Example of inpaint doesn't work for Stable Diffusion 2.1 #60

Closed Emulator000 closed 1 year ago

Emulator000 commented 1 year ago

I'm trying the same example for the 2.1 configuration, downloaded the appropriate CLIP, UNET and VAE and converted them correctly but it does not seems to work.

Command:

cargo run --example stable-diffusion-inpaint --features clap -- --sd-version="v2-1" --prompt "Face of a yellow cat, high resolution, sitting on a park bench." --input-image="temp/dog.png" --mask-image="temp/dog_mask.png" --width=512 --height=512

This is the output that i get with the dog/cat example that works perfectly with Stable Diffusion 1.5: image

It seems that the inpainted area doensn't correctly populate.

Any possible reason for that? Should the example/code be adapted for some additional steps for the 2.1 version?

LaurentMazare commented 1 year ago

No clue what is going on here, I also tried it and got the same results using the weights from stabilityai/stable-diffusion-2-inpainting. I also tried the native resolution of 768x768 without luck. Spotting the json config files for the different versions, I haven't noticed anything that would obviously require some adaptation. I guess at this point the simpler would likely be to run the inpainting process on the Python and Rust side and see at which layer things start to diverge.

LaurentMazare commented 1 year ago

Ah it seems that actually one difference was that the scheduler for stable-diffusion 2.1 uses a prediction type of v_prediction in normal generation but uses a prediction type of epsilon for inpainting (whereas stable-diffusion 1.5 uses epsilon for both). I've just merged a PR #63 that should hopefully help with this - at least on a single generated image it looks better now. sd_final

There is still a scheduler inconsistency as we use DDIM rather than PNDM - also using a proper DPM solver would likely help here but hopefully this doesn't make much of a difference. Please give a spin to the current github tip if you can and let us know how it gets.