MichalGeyer / plug-and-play

Official Pytorch Implementation for “Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation” (CVPR 2023)
893 stars 53 forks source link

Controling the number of inversion steps #9

Open ArielReplicate opened 1 year ago

ArielReplicate commented 1 year ago

Hi it seems the number of DDim steps for inversion is fixed : https://github.com/MichalGeyer/plug-and-play/blob/ce9e4e23c24b4241cb9ecae9ea0b090c70871870/run_features_extraction.py#L224

Is there a specific reason why you only exposed the number of parameters for sampling the features (ddim_steps: 999 in feature extraction .yaml config files)

How related are the two parameters (i.e ddim_inversion_steps and exp_config.config.ddim_steps ) does a change in one requires changing the other? I think the inversion could be good enough with 50 ddim_steps ?

tnarek commented 1 year ago

Hi @ArielReplicate, the ddim_steps parameter indicates the number of backward sampling steps used during the inversion (i.e. from latent noise to image), while the fixed variable ddim_inversion_steps = 999 indicates the number of DDIM forward steps used in the inversion (i.e. from image to latent noise), so they can have differing values. We found the most reliable configuration for both parameters to be the full sampling steps (=999), where we get the best reconstruction of the structure image.

Note that for real images, the timesteps at which features are saved are determined by the save_feature_timesteps parameter, which determines the num_ddim_sampling_steps that can be used during the translation (num_ddim_sampling_steps = save_feature_timesteps).

ArielReplicate commented 1 year ago

Thanks for the elaboration!

It seem though that ddim_steps is not related to inversion only to sampling after the inversion https://github.com/MichalGeyer/plug-and-play/blob/ce9e4e23c24b4241cb9ecae9ea0b090c70871870/run_features_extraction.py#L229

Is it using the fact that model.num_timesteps is set to 1000?

When I try to change the ddim_inversion_steps to 50 I got this error:

z_enc, _ = sampler.encode_ddim(init_latent, num_steps=exp_config.ddim_inversion_steps,
File "/root/.pyenv/versions/3.8.16/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/src/ldm/models/diffusion/ddim.py", line 281, in encode_ddim
img, _ = self.reverse_ddim(img, t, t_next=steps[i+1] ,c=conditioning, unconditional_conditioning=unconditional_conditioning, unconditional_guidance_scale=unconditional_guidance_scale)
File "/root/.pyenv/versions/3.8.16/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/src/ldm/models/diffusion/ddim.py", line 306, in reverse_ddim
a_next = torch.full((b, 1, 1, 1), alphas[t_next], device=device) #a_next = torch.full((b, 1, 1, 1), alphas[t + 1], device=device)
IndexError: index 1007 is out of bounds for dimension 0 with size 1000
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x86da6c]

I'm trying to speed things up (each of the inversion/sampling take ~1.5 minutes on a A100 instance) to speed up the

tnarek commented 1 year ago

you are correct, ddim_steps is only used during the backward sampling, which actually isn't a part of the inversion process (sorry for the misinterpretation in the previous answer). Notice that in ddim.sample, the model scheduler timesteps are being reset to ddim_steps.

https://github.com/MichalGeyer/plug-and-play/blob/ce9e4e23c24b4241cb9ecae9ea0b090c70871870/ldm/models/diffusion/ddim.py#L111

The error you pointed out seems to be due to an incorrect calculation of inversion sampling steps in ddim.encode_ddim in case ddim_inversion_steps = 50.

https://github.com/MichalGeyer/plug-and-play/blob/c5da4c1d252b1899558b4649c791d85c5f31e889/ldm/models/diffusion/ddim.py#L278

We will work on fixing this issue, thanks for reporting it !