Hello. I'm interested in your work and have a question about your implementation.
When implementing PGD, I noticed that your code uses only the gradient sign, not the gradient value, as shown in the following line:
adv_noise.data = (adv_noise.data - alpha * adv_noise.grad.detach().sign()).clamp(-epsilon, epsilon) in visual_attacker.py
Is there a specific reason for this?
Hi, this is a commonly used trick back to https://arxiv.org/abs/1412.6572
My experience is that gradient sign descent usually gives better results, less likely to be stuck at a local optimal.
Hello. I'm interested in your work and have a question about your implementation. When implementing PGD, I noticed that your code uses only the gradient sign, not the gradient value, as shown in the following line:
adv_noise.data = (adv_noise.data - alpha * adv_noise.grad.detach().sign()).clamp(-epsilon, epsilon)
invisual_attacker.py
Is there a specific reason for this?