YuxinWenRick / diffusion_memorization

Official repo for Detecting, Explaining, and Mitigating Memorization in Diffusion Models (ICLR 2024)
56 stars 7 forks source link

List of non-memorized prompts #1

Closed LukasStruppek closed 7 months ago

LukasStruppek commented 8 months ago

Hey,

thank you for providing the code to reproduce your experiments. In addition to the list of memorized samples, could you please also provide the prompts of the non-memorized samples you used during your experiments? The paper states, the experiments were conducted on 2,000 prompts from COCO, LAION, Lexica and randomly generated strings. This would improve the reproducibility of the method.

Best, Lukas

YuxinWenRick commented 8 months ago

Hi Lukas, thanks for reaching out!

I no longer have access to the machines I used during my internship, so I may not be able to provide the exact subsets used in our paper. However, I don't think it's a problem to reproduce the results, since our number is averaged over 5 different runs on different subsets, and the numbers are not super noisy.

I have updated the code for COCO and randomly generated strings: The sample command line for non-memorized exp is for Lexica: python detect_mem.py --run_name non_memorized_prompts --dataset Gustavosta/Stable-Diffusion-Prompts --end 500 --gen_seed 0 For COCO and random strings, you may use --dataset ChristophSchuhmann/MS_COCO_2017_URL_TEXT and --dataset random respectively.

However, regarding LAION, we currently do not have public access, and it is unavailable even on our school's cluster. Therefore, we might not be able to reproduce that part until a clean version is released.

I hope this resolves some of your issues. Please let me know if you have any further questions.

LukasStruppek commented 7 months ago

Hi Yuxin,

thank you very much for the quick response and the updated code. This helps a lot :)

Best, Lukas