CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
68.4k stars 10.17k forks source link

txt2img.py crash if ddim_steps is power of 3 #111

Open mvnowak opened 2 years ago

mvnowak commented 2 years ago

There is certain values for the ddim_steps parameter, for which the model crashes with the stack trace appended at the end.

Example that leads to the crash: python scripts/txt2img.py --prompt "tree" --ddim_steps 9 --n_samples 1

Stacktrace:

 Traceback (most recent call last):
  File "scripts/txt2img.py", line 344, in <module>
    main()
  File "scripts/txt2img.py", line 295, in main
    samples_ddim, _ = sampler.sample(S=opt.ddim_steps,
  File "..\envs\ldm\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "..\ldm\models\diffusion\ddim.py", line 90, in sample
    self.make_schedule(ddim_num_steps=S, ddim_eta=eta, verbose=verbose)
  File "..\ldm\models\diffusion\ddim.py", line 44, in make_schedule
    ddim_sigmas, ddim_alphas, ddim_alphas_prev = make_ddim_sampling_parameters(alphacums=alphas_cumprod.cpu(),
  File "..\ldm\modules\diffusionmodules\util.py", line 65, in make_ddim_sampling_parameters
    alphas = alphacums[ddim_timesteps]
IndexError: index 1000 is out of bounds for dimension 0 with size 1000
ryanirl commented 2 years ago

Solution: In the make_ddim_timesteps() function in ldm/modules/diffusionmodules/util.py change line 49 which has the following code: ddim_timesteps = np.asarray(list(range(0, num_ddpm_timesteps, c))) to ddim_timesteps = (np.arange(0, num_ddim_timesteps) * c).astype(int)

Reasoning:

The problem is coming from the make_ddim_timesteps function in ldm/modules/diffusionmodules/util.py

Line 49 has the following line of code ddim_timesteps = np.asarray(list(range(0, num_ddpm_timesteps, c))). The goal of this code is to sample num_ddim_timesteps evenly strided integers in the range from 0 to num_ddpm_timesteps at a stride of c where c = num_ddpm_timesteps // num_ddim_timesteps. Though the problem is that the code often samples num_ddim_timesteps + 1 timesteps. That is, ddim_timesteps.shape[0] does NOT equal num_ddim_timesteps (which is bad).

It just so happens that one of the bad cases (when ddim_timesteps.shape[0] != num_ddim_timesteps) is when num_ddim_timesteps % 3 == 0. Furthermore, when num_ddim_timesteps is a power of 3 the extra sampled integer in ddim_timesteps (that should not be there) happens to be the integer 999. Then in line 58 we compute steps_out = ddim_timesteps + 1 causing 999 -> 1000 which results in your out of bounds error.

The solution to solve both problems (remove all bad cases where ddim_timesteps.shape[0] != num_ddim_timesteps and therefore your out of bounds error) is to replace line 49 with the following code: ddim_timesteps = (np.arange(0, num_ddim_timesteps) * c).astype(int)

Important: This code is equivalent to the previous code in the working cases though excludes the extra time step in the bad cases. Therefore, it should be a simple and general fix to two bugs. Note that I keep the same astype(int) notation in the ddim_discr_method == “quad” case in line 51.

Funny enough, they had an assertion on line 56 to make sure that ddim_timesteps.shape[0] == num_ddim_timesteps, though it seems to be commented out. Maybe someone meant to go back and fix it though never did.

patrickvonplaten commented 2 years ago

Another option would be to try diffusers:

# make sure you're logged in with `huggingface-cli login`
from torch import autocast
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", 
    use_auth_token=True
).to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
    image = pipe(prompt, num_inference_steps=27)["sample"][0]  

image.save("astronaut_rides_horse.png")
GrandArth commented 2 years ago

ryanirl's solution also solves the problem when u specify a larger ddim_step, the c in the original line will be 0, causing a third parameter can not be 0 error. There should be a merge request for this.

vvsotnikov commented 2 years ago

@patrickvonplaten unfortunately, your code sample results in the same bug:

Traceback (most recent call last):
  File "/home/vladimir/miniconda3/envs/SD/lib/python3.9/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 12, in <module>
  File "/home/vladimir/miniconda3/envs/SD/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/vladimir/miniconda3/envs/SD/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 273, in __call__
    latents = self.scheduler.step(noise_pred, t, latents,
  File "/home/vladimir/miniconda3/envs/SD/lib/python3.9/site-packages/diffusers/schedulers/scheduling_pndm.py", line 202, in step
    return self.step_plms(model_output=model_output, timestep=timestep, sample=sample, return_dict=return_dict)
  File "/home/vladimir/miniconda3/envs/SD/lib/python3.9/site-packages/diffusers/schedulers/scheduling_pndm.py", line 317, in step_plms
    prev_sample = self._get_prev_sample(sample, timestep, prev_timestep, model_output)
  File "/home/vladimir/miniconda3/envs/SD/lib/python3.9/site-packages/diffusers/schedulers/scheduling_pndm.py", line 338, in _get_prev_sample
    alpha_prod_t = self.alphas_cumprod[timestep + 1 - self._offset]
IndexError: index 1000 is out of bounds for dimension 0 with size 1000
patrickvonplaten commented 2 years ago

Hey @vvsotnikov thanks for the message!

Could you try updating your diffusers to 0.4.0.dev0?

# make sure you're logged in with `huggingface-cli login`
from torch import autocast
from diffusers import StableDiffusionPipeline
import diffusers
print("diffusers vesion", diffusers.__version__)

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", 
    use_auth_token=True
).to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
    image = pipe(prompt, num_inference_steps=27)["sample"][0]  

image.save("astronaut_rides_horse.png")

This works for me with output:

diffusers vesion 0.4.0.dev0