YuxinWenRick / diffusion_memorization

Official repo for Detecting, Explaining, and Mitigating Memorization in Diffusion Models (ICLR 2024)
56 stars 7 forks source link

SSCD score of memorized samples #5

Closed LukasStruppek closed 7 months ago

LukasStruppek commented 7 months ago

Dear Yuxin,

a question regarding the experimental setup came up. We used the prompts provided in sdv1_500_memorized.jsonl to generate images with SDv1.4. We then computed the SSCD scores of the generated images and the real images. However, the SSCD scores vary between quite high and pretty low. In the paper, the following is written:

To evaluate our detection method, we use 500 memorized prompts identified in Webster (2023) for Stable Diffusion v1 (Rombach et al., 2022), where the SSCD similarity score (Pizzi et al., 2022) between the memorized and the generated images exceeds 0.7.

Does this mean, all images generated from the 500 prompts in the json file achieved an SSCD score > 0.7 in your experiments? Or did you apply an additional filtering using the computed SSCD score to filter out the strongly memorized samples? In our experiments, only 100-120 (depending on the SSCD model) out of the 500 prompts achieve a maximum SSCD score > 0.7. All SSCD scores were computed across 10 generations with different seeds. We also manually inspected the images and some generated images showcase only slight memorization, so the assigned SSCD scores seem to actually match the amount of memorization.

Best, Lukas

YuxinWenRick commented 7 months ago

Hi Lukas,

Thanks for reaching out, and I'm sorry about the confusion. Actually, the URLs in sdv1_500_memorized.jsonl are the links to the original image pairs but not necessary the memorized images of the prompt, because there are a lot of retrieval-verbatim cases, where the model memorized the mapping from the prompt to another image or other several images.

I have uploaded the groundtruth memorized images here. You can download and unzip it first, and run: python inference_mem.py --run_name no_mitigation --dataset sdv1_500_mem_groundtruth --end 500 --gen_seed 0 --reference_model ViT-g-14 --with_tracking. You should get much higher SSCD in this case.

Please let me know if you have further questions!

LukasStruppek commented 7 months ago

Hi Yuxin,

thank you very much for your quick response. This helps a lot! One other question regarding the SSCD: For the unmemorized samples, e.g., the LAION, COCO, and Random Samples, did you also compute SSCD scores for these prompts to ensure that these images are not memorized? Or phrased differently, for these images the SSCD score should indeed be computed between the real images and the generated images, right?

Best, Lukas

YuxinWenRick commented 7 months ago

Hi Lukas,

Yeah, I think, to be safe, we should also compute SSCD for these datasets to ensure there are no memorized images. However, we didn't do that, because we think the chance of being memorized is very very low, but it will be good if the future work can do it.

Best, Yuxin

LukasStruppek commented 7 months ago

Hi Yuxin,

thanks again for the clarification. I was now also able to compute the SSCD scores with your code base. They are indeed higher than the scores we computed manually. Still, not all scores are above 0.7. Taking the max SSCD scores, 271 of 500 scores are > 0.7. Does this mean in your paper, you took those 271 samples are the memorized samples? Or did you take all 500 samples even if their SSCD score is substantially lower, e.g., the smallest max score is only 0.01 in my results.

Best, Lukas

YuxinWenRick commented 7 months ago

Hi Lukas,

Thanks for letting me. I just ran a sanity check. You are right, and I got ~100 samples with very low similarity even with 16 generations per prompt.

I think the problem might be that some of the gt images are not reachable anymore from the links provided by the original repo, so I skipped them when I downloaded them this week, but they should have been working last summer. However, I don't have the original copies anymore, since I cleaned my cluster storage after my internship. However, I don't think that affects that these examples are memorized by the stable diffusion v1 from the original repo.

If you just want to do the detection task, I think it's fine. But, if you want to do the mitigation task, maybe you can do some filtering first, or use the COCO fine-tuned model.

Best, Yuxin

LukasStruppek commented 7 months ago

Hi Yuxin,

I see, thank you. Then we will see how we can handle this problem :)

Best, Lukas