Closed KennyNH closed 2 years ago
Is there any problems?
Additionally, final perturbations are all related to structure instead of attribute. Does it reflect something wrong?
Hi,
Thanks for your interest in our work.
Regarding your first question: In this line we flip the sign of the gradients for elements where the respective feature is present. This has the effect that for these elements we prefer changes in the negative direction, i.e. setting them to zero. What's missing is setting the gradients to zero for elements where the gradient points into an invalid direction. However, this case would only happen if there was no element where a valid perturbation leads to an increase in loss, which we have never observed in practice. So I don't see a problem here.
Regarding your question about structure vs attribute perturbations: Did you make sure to set the corresponding flag that both structure and attributes are perturbed? In general, structure perturbations are much more effective so it is not surprising that a vast majority of perturbations are structure perturbations.
Let me know if you have follow-up questions.
Thank you for your sincere answer, now I'm clear about the code.
It seems that the code doesn't comply to the sent in the paper, "The elements where the gradient points outside the allowable direction should not be perturbed since they would only hinder the attack – thus, the old score stays unchanged." and the gradients are not sorted by their absolute values but original ones.