ShivamShrirao / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
https://huggingface.co/docs/diffusers
Apache License 2.0
1.89k stars 509 forks source link

I am not able to reproduce result of Imagic_Stable_Diffusion.ipynb #219

Open qlinsey opened 1 year ago

qlinsey commented 1 year ago

Describe the bug

I follow this Imagic_Stable_Diffusion.ipynb, however, i got exception of: TypeError: call() got an unexpected keyword argument 'text_embeddings' when i run: with autocast("cuda"), torch.inference_mode(): images = pipe(text_embeddings=edit_embeddings, height=height, width=width, num_images_per_prompt=num_samples, num_inference_steps=num_inference_steps, guidance_scale=guidance_scale, generator=g_cuda ).images

Then I looked at API , I changed from text_embeddings=edit_embeddings to prompt_embeds=edit_embeddings.

But the image generated not changing at all, same as original images. I tried obama and bird images provided by this notebook.

Please guide what the problem is , thanks!

Reproduction

with autocast("cuda"), torch.inference_mode(): images = pipe(text_embeddings=edit_embeddings, height=height, width=width, num_images_per_prompt=num_samples, num_inference_steps=num_inference_steps, guidance_scale=guidance_scale, generator=g_cuda ).images

got exception: TypeError: call() got an unexpected keyword argument 'text_embeddings'

Then changed to :

with autocast("cuda"), torch.inference_mode(): images = pipe(prompt_embeds=edit_embeddings , height=height, width=width, num_images_per_prompt=num_samples, num_inference_steps=num_inference_steps, guidance_scale=guidance_scale, generator=g_cuda ).images

Logs

No response

System Info

used this: https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/imagic/Imagic_Stable_Diffusion.ipynb

Daisy5296 commented 1 year ago

Exactly the same for me. I then deleted the "--use_8bit_adam" option in the training code. The outputs start to change. But still not as good as published--either too different from the reference image, or the same to it. I guess maybe a very careful tuning of the parameters are needed. Or, the SD version is not as good as using the imageN model.

minzhang-1 commented 1 year ago

Hi, anyone have tried this? I can't reproduce the result after fine-tuning for a long time? @ShivamShrirao How did you get the same result as the paper? Many thanks!

ShivamShrirao commented 1 year ago

Hey sorry. The API changed a bit later and doesn't use text embeddings rn. I didn't get to update the imagic code.

minzhang-1 commented 1 year ago

@ShivamShrirao I change the text_embeds to be prompt_embeds so the code actually works. It gives me some generated results. But the problem is that it is very difficult to get the results as the paper.