Closed 120L020904 closed 1 month ago
Actually not. As mentioned in our paper, ASR = pre-ASR + post-ASR. We also remark that ASR reduces to pre-ASR when no adversarial attack is applied to text prompts.
When facing inappropriate test prompts, we will dissect the attack success rate (ASR) into two categories:
The effectiveness of our proposed attack will be quantified by post-ASR as it measures the number of successfully bypassed unlearning safeguards using adversarial perturbations.
So is the ASR in the table of the paper "DEFENSIVE UNLEARNING WITH ADVERSARIAL TRAINING FOR ROBUST CONCEPT ERASURE IN DIFFUSION MODELS" actually the post-ASR in the paper "To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy to Generate Unsafe Images ... For Now"?
The ASR in the table of the paper "AdvUnlearn" is exactly same as its definition in the paper of "UnlearnDiffAtk".
ASR = pre-ASR + post-ASR
Pre-ASR denotes the attack sucess rate of original prompt without any attacks.
I see. I will correct this. The Post-ASR should be ASR.
Can ASR be greater than 1?
No. It cannot be larger than 100%
In AdvUnlearn, you mentioned ASR, and the test results are the same as the post-ASR on the project homepage. Is the ASR in AdvUnlearn the same as post-ASR?