Closed futakw closed 2 months ago
Sorry, I misunderstood the Diffusion process itself, this was not related to adversarial attack implementation. SD model predicts $\epsilon = z_T - z_0.$
I will close this issue.
Hey Futa, Sorry for the late reply. I was busy.. But I am so happy that you found the solution!
Hi,
I have a question on the adversarial attack part.
https://github.com/chkimmmmm/R.A.C.E./blob/8913eda752f6087f6d240140a81eccbe2950c3fb/train-scripts/train-esd.py#L326
My understanding is that while the original SD model can predict noise $\epsilon$ on $zt$ to generate a cleaned image $z{t-1} = zt - \epsilon$, the ESD model cannot predict $\epsilon$ for an unlearned concept. Then, the PGD attack aims to prompt the unlearned model to "re"-generate a cleaned image $z{t-1}$ (with an unlearned concept), by enforcing the prediction to be close to $\epsilon$.
However, it seems that the PGD attack enforces the predicted noise to be closer to the start_code = $z_T$, instead of $\epsilon = zt - z{t-1}$ (which is unknown?).
I'm curious why this approach is effective. Please correct me if I've misunderstood the algorithm.