cure-lab / MMA-Diffusion

[CVPR2024] MMA-Diffusion: MultiModal Attack on Diffusion Models
Other
81 stars 5 forks source link

Questions about the paper experiments #5

Open RichardSunnyMeng opened 1 week ago

RichardSunnyMeng commented 1 week ago

Hi, authors. Thank you for your efforts for safe AIGC and your creative work. But I have some issues with the experiments in the paper.

  1. Do the experiments for attacking open-source models (Sec. 4.2) and online services (Sec. 4.3) only involve the text modality? If yes, how can you disable the safety checkers of online services such as Midjourney?
  2. Are the adv. prompts used in the above experiments all optimized using SD v1.5?
  3. Are the multimodal attack results (Sec. 4.4) also obtained using SD v1.5? If yes, can the adversarial images be transferred to black-box scenarios like the text modality?
  4. Is there a time comparison? Many attack methods spend a lot of time, so I think it is an important problem.

Best.

yangyijune commented 1 week ago

Hi, Richard! For your concerns: 1. Yes. We can not disable the online service's safety checker. Our adv. prompts can bypass the prompt checker in a black-box attack way. 2. Yes. 3. Yes. No. The adversarial images can only cheat image-modal safety checker. 4. No. We merely report MMA's time cost in the paper.

RichardSunnyMeng commented 1 week ago

Thanks for your response! But I still have two questions.

  1. For 1, the results of online services indicate the performance of not only filters but also checkers, so if we can disable these checkers, can we obtain a better performance? And from Tab.2, many inappropriate images generated by adversarial prompts are released. If it is because the ability of these checkers is limited, it seems like that we don't need adversarial images?
  2. For 3, I mean whether you evaluate multimodal attacks on commercial models such as Midjourney.
yangyijune commented 1 week ago

Hi Richard,

  1. MidJourney appears to rely solely on the prompt filter, making adv. prompts sufficient. I suspect this is because the post-hoc safety checker is costly and has a high false positive rate.
  2. For multimodal attacks, we do not evaluate them on Midjounery/Leonadro.AI, as text-modal attacks are sufficient.