Implement Euler Discrete Scheduler

mspronesti commented 1 year ago

Hi @LaurentMazare, this PR aims at integrating the Euler Discrete Scheduler into this repository, solving the first task mentioned in #23.

The implementation should contain all the features of the python version. Please notice that, differently from the other schedulers currently supported, here init_noise_sigma is not equal to 1.0. Also, this scheduler requires to scale the denoising model input. We might say that the other 3 schedulers supported by diffusers-rs are a "special" case.

Therefore, to run the stable diffusion pipeline with this scheduler, one needs to multiply the initial latents by init_noise_sigma here (Python version)

let mut latents = Tensor::randn(
    &[bsize, 4, sd_config.height / 8, sd_config.width / 8],
    (Kind::Float, unet_device),
) * scheduler.init_noise_sigma();

and to scale the model input after this line (Python version)

let latent_model_input = scheduler.scale_model_input(latent_model_input, timestep);

HF's python version makes everything consistent implementing init_noise_sigma() and scale_model_input() in all the schedulers. If you wish, I can fix the example in this PR (and, thereby, add a scale_model_input method to all the schedulers, which only returns sample when no scaling is needed, i.e. what HF does). Otherwise, I can open a new one.

Overall, I believe this PR is self-contained regardless of the pipeline examples.

EDIT: the above text is a little dense. To sum up:

This PR implements Euler discrete, porting all the features from HF diffusers.
the stable diffusion pipeline examples require to be updated, as they assume that init_noise_sigma is 1.0 and that the denoising model input doesn't need to be scaled. I propose to copy what HF does, i.e. making all the schedulers implement a getter for init_noise_sigma (done already for all the schedulers I implemented in the past 2 weeks) and a scale_model_input method which simply returns sample when no scaling is needed :)
```
pub fn scale_model_input(&self, sample: Tensor, timestep: f64) -> Tensor {
   sample 
}
```
However, this is, in my opinion, outside of the scope of this PR as it only aims at adding a new scheduler.

EDIT 2: I did what I mentioned in the second bullet in this branch. If you agree, I will open a new PR once this gets merged (assuming you find these valuable contributions to diffusers-rs :) )

LaurentMazare commented 1 year ago

Thanks for the PR, I just merged it, adding the scale_model_input step in the examples seems sensible so that such scheduler could more easily be used.

mspronesti commented 1 year ago

Sure, I will open a new PR from that other branch then! :)

mspronesti commented 1 year ago

Done (#35) .

sssemil commented 1 year ago

Hi, I'm getting blurry results with this one. I multiplied latents by scheduler.init_noise_sigma() and also added let latent_model_input = scheduler.scale_model_input(latent_model_input, timestep); after let latent_model_input = Tensor::cat(&[&latents, &latents], 0);. Anything else I could be missing? Thanks.

mspronesti commented 1 year ago

Hi @sssemil, try with this main file (comes from the other PR I opened yesterday). I just cloned the main branch of this repo again, copy-pasted this main (which simply generalizes the stable diffusion pipeline), changed the scheduler at line 245

let scheduler = EulerDiscreteScheduler::new(n_steps, Default::default());

and run the diffusion pipeline as follows

cargo run --example stable-diffusion --features clap -- --prompt "A very rusty robot holding a fire torch." --sd-version=v1-5 --n-steps=15 --seed=91 --cpu all

this is the output I got:

sd_final

You can check whether the produced image is "appropriate for this scheduler" (compared to the python version) using, for instance, the following python snippet

import torch
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)

pipe.scheduler = EulerDiscreteScheduler()

pipe = pipe.to("cuda")
prompt = "A very rusty robot holding a fire torch."
image = pipe(prompt, num_inference_steps=15).images[0]

Let me know if it also works for you :)

sssemil commented 1 year ago

Ah, I see. Then this is just how this method works :) Euler A. just looks better in general.

mspronesti commented 1 year ago

Yes, in my experience Euler A. works better. Did you give a shot to my implementation of that (see PR open)?

LaurentMazare / diffusers-rs

Implement Euler Discrete Scheduler #32