Computation of score functions in feature attacks

KennyNH commented 2 years ago

It seems that the code doesn't comply to the sent in the paper, "The elements where the gradient points outside the allowable direction should not be perturbed since they would only hinder the attack – thus, the old score stays unchanged." and the gradients are not sorted by their absolute values but original ones.

KennyNH commented 2 years ago

Is there any problems?

KennyNH commented 2 years ago

Additionally, final perturbations are all related to structure instead of attribute. Does it reflect something wrong?

danielzuegner commented 2 years ago

Hi,

Thanks for your interest in our work.

Regarding your first question: In this line we flip the sign of the gradients for elements where the respective feature is present. This has the effect that for these elements we prefer changes in the negative direction, i.e. setting them to zero. What's missing is setting the gradients to zero for elements where the gradient points into an invalid direction. However, this case would only happen if there was no element where a valid perturbation leads to an increase in loss, which we have never observed in practice. So I don't see a problem here.

Regarding your question about structure vs attribute perturbations: Did you make sure to set the corresponding flag that both structure and attributes are perturbed? In general, structure perturbations are much more effective so it is not surprising that a vast majority of perturbations are structure perturbations.

Let me know if you have follow-up questions.

KennyNH commented 2 years ago

Thank you for your sincere answer, now I'm clear about the code.

danielzuegner / nettack

Computation of score functions in feature attacks #12