OPTML-Group / Diffusion-MU-Attack

The official implementation of ECCV'24 paper "To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now". This work introduces one fast and effective attack method to evaluate the harmful-content generation ability of safety-driven unlearned diffusion models.
MIT License
57 stars 3 forks source link

How is the target image obtained? #8

Closed AntigoneRandy closed 2 months ago

AntigoneRandy commented 2 months ago

Hi, dear authors, thanks so much for the great work, which is really interesting and inspiring. From Eq.(4) of the paper, it seems that the optimization problem $\max{c^\prime}p{\theta^*}(c^\prime|x\text{tgt})$ requires access to a target image $x\text{tgt}$. But during evaluation it seems that only the unlearned model parameters and an original prompt are available. Could you please kindly provide details on how is the target image obtained? Thank you very much!

damon-demon commented 2 months ago

Hi Boheng,

Thank you for your interest in our work on evaluating unlearned diffusion models. To maintain consistency in our evaluation results, the target images are generated using the original base diffusion model (e.g., SD v1.4). It's important to note that the target images can also be random internet images, provided they are relevant to the concept targeted for erasure.