alibaba / EasyNLP

EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit
Apache License 2.0
2.03k stars 250 forks source link

FreePromptEditing Experiments #357

Closed momopusheen closed 5 months ago

momopusheen commented 5 months ago

Hi,

Thanks for your brilliant work on "Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing".

I encountered some problems while trying to reproduce the image editing results.

Specifically, when utilizing the vanilla implementation of FPE in _editreal.ipynb for real image editing, I noticed that the edited images did not meet expectations. For instance, I used the target prompt "a photo of a silver robot" to edit the image. However, the edited image are not consistent with the images provided in the paper. As shown below, the person's head did not transform into a robot as expected.

My base model is Stable Diffusion 1.5, with torch == 1.11.0 and diffusers == 0.16.1.

Do you recommend adjusting any hyperparameters (such as self_replace_steps) to reproduce the expected results? Furthermore, are there any plans to provide evaluation code for computing CS and CDS?

I would greatly appreciate any assistance or guidance you can offer in addressing these issues.

Thank you for your support in advance.

Here's my edited image:

6_tar

image
Bingyan-Liu commented 5 months ago

Thank you for your interest in our work. In this comparison example, both P2P and our method utilize null text inversion for image editing, serving as a comparative distinction between the cross-attention map and self-attention map replacement. Please consider trying another editing notebook: null_text_w_FPE.ipynb. For the computation of CS and CDS, we have used the validation code from the huggingface-diffusers release, with the link being: https://huggingface.co/docs/diffusers/conceptual/evaluation

momopusheen commented 5 months ago

Thank you for your continued support!

I utilized null_text_w_FPE.ipynb for image editing, where I set the _promptsrc = "". After experimenting with various self_replace_steps, I noticed that the edited images still exhibit discrepancies compared to the the paper.

Furthermore, I'm curious about the experimental setup for real image editing. Did you employ null text inversion for all experiments of real image editing? If so, could you please share how you set the "prompt_src" in such cases?

Thank you for your assistance!

self_replace_steps = .8

fc0aa216-7e27-4375-95db-14d504770f54

self_replace_steps = .6

81b515b4-b8fa-4acc-b6f7-a5ab04ff3dca

self_replace_steps = .4

b1db0482-fc70-45cb-85d5-5f364057ce9e

Bingyan-Liu commented 5 months ago

In the demonstration images within the paper, the comparison pictures with P2P were all conducted under null text inversion. For "prompt_src," we utilized the source prompts directly from the TI2I-benchmark, which is associated with the PNP paper. . You can download this part of the dataset from the TI2I-benchmark homepage to obtain the corresponding source prompts for the images. We hope this information will be helpful to you.

momopusheen commented 5 months ago

It seems there are no source prompts in the Wild-TI2I-Real benchmark. Did you use these descriptions as "prompt_src"?

image
Bingyan-Liu commented 5 months ago

For Wild-TI2I-Real benchmark, prompt_src: a photo of a {}, e.g. "a photo of a dog face", "a photo of a dog bear sketch".

momopusheen commented 5 months ago

I employed the prompt_src and null text inversion methods for image editing. I utilized the validation code in diffusers to compute the CDS metric.

However, in my experiment, I observed that the CDS for FPE on Wild-ti2i-real was 0.1008, while on ImageNet-R-real, it was 0.1718. I'm unsure about where the discrepancy may occur.

Would you mind sharing the image editing results of FPE on these benchmarks?

Bingyan-Liu commented 5 months ago

You could try different ”self_replace_steps“, which would lead to difference results. We conducted experiments on cloud devices, and it will take me some time to locate the images used previously for calculating evaluation metrics. I will attend to this when I have spare time.

momopusheen commented 5 months ago

Thanks a lot!