Confusion about testing metrics

OPTML-Group / Diffusion-MU-Attack

The official implementation of ECCV'24 paper "To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now". This work introduces one fast and effective attack method to evaluate the harmful-content generation ability of safety-driven unlearned diffusion models.

MIT License

57 stars 3 forks source link

Confusion about testing metrics #11

Closed 120L020904 closed 1 month ago

120L020904 commented 1 month ago

In AdvUnlearn, you mentioned ASR, and the test results are the same as the post-ASR on the project homepage. Is the ASR in AdvUnlearn the same as post-ASR?

damon-demon commented 1 month ago

Actually not. As mentioned in our paper, ASR = pre-ASR + post-ASR. We also remark that ASR reduces to pre-ASR when no adversarial attack is applied to text prompts.

When facing inappropriate test prompts, we will dissect the attack success rate (ASR) into two categories:

the pre-attack success rate (pre-ASR),
the post-attack success rate (post-ASR).

The effectiveness of our proposed attack will be quantified by post-ASR as it measures the number of successfully bypassed unlearning safeguards using adversarial perturbations.

120L020904 commented 1 month ago

So is the ASR in the table of the paper "DEFENSIVE UNLEARNING WITH ADVERSARIAL TRAINING FOR ROBUST CONCEPT ERASURE IN DIFFUSION MODELS" actually the post-ASR in the paper "To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy to Generate Unsafe Images ... For Now"?

damon-demon commented 1 month ago

The ASR in the table of the paper "AdvUnlearn" is exactly same as its definition in the paper of "UnlearnDiffAtk".

ASR = pre-ASR + post-ASR

Pre-ASR denotes the attack sucess rate of original prompt without any attacks.

120L020904 commented 1 month ago

QQ20241008-232407 QQ20241008-232439

damon-demon commented 1 month ago

I see. I will correct this. The Post-ASR should be ASR.

120L020904 commented 1 month ago

Can ASR be greater than 1?

damon-demon commented 1 month ago

No. It cannot be larger than 100%