Open vikm2o opened 5 months ago
I tried making changes for SD3 but it failed here
latents = original_with_noise[i] * mask + latents * (1 - mask)
prepare_latents in SD3 Pipeline (https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3_img2img.py) has this shape shape torch.Size([1, 16, 80, 120])
and if I change to latents = original_with_noise[:1]* mask + latents * (1 - mask)
it doesn't work
my changes are here https://github.com/vikm2o/differential-diffusion
Hi,
Can you specify the dimension of:
original_with_noise[i]
,mask
, latents
?
If I recall correctly the only difference dimensionwise, for SD3, is the dimension of the latent space which is 16 in SD3 instead of 4 for earlier versions.
Thanks!
latents shape torch.Size([1, 16, 80, 120])
original_with_noise shape torch.Size([1, 16, 80, 120])
masks torch.Size([120, 80, 120])
prepare_latents in SD3 Pipeline produces latents of this shape torch.Size([1, 16, 80, 120])
in this line https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3_img2img.py#L646
The number of steps is 120? If not there might be an error in one of the broadcasting operations.
The step should be:
latents = original_with_noise[i] * mask + latents * (1 - mask)
original_with_noise
should contain versions of the picture wth amount of noise corresponding to the different timesteps.
yes number of steps is 120 . retrieve_timesteps in this line https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3_img2img.py#L853 changes 200 to 120. Not sure why but this is what I observe.
original_with_noise shape for SD3 is torch.Size([1, 16, 80, 120])
for 200 steps. For SD2 it's torch.Size([201, 4, 80, 120])
for 200 steps.
So it can't be indexed with original_with_noise[i]
yes number of steps is 120 . retrieve_timesteps in this line https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3_img2img.py#L853 changes 200 to 120. Not sure why but this is what I observe.
Yeah, this is a new function which does not exist in diffusers@0.19. I am not sure what it does. I opened an issue in diffusers: https://github.com/huggingface/diffusers/issues/8577 Maybe they can advise you?
original_with_noise shape for SD3 is
torch.Size([1, 16, 80, 120])
for 200 steps. For SD2 it'storch.Size([201, 4, 80, 120])
for 200 steps. So it can't be indexed with original_with_noise[i]
OK, if I understand correctly what is missing is creating a tensor with multiple noised version of the original image (original_with_noise should be torch.Size([201, 16, 80, 120])
)
yes
I believe it can, the algoirthm should be the same. I consider making a new release with the following diffusion models: SC, PixArt-Σ, SD3, and Hunyuan-DiT.