How to guarantee the semantic preservation of the adversarial examples

QData / TextAttack

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

https://textattack.readthedocs.io/en/master/

MIT License

2.87k stars 383 forks source link

How to guarantee the semantic preservation of the adversarial examples #510

Closed zhujiangang closed 2 years ago

zhujiangang commented 3 years ago

For example, if the gold function is untargeted classification, this means that we need to search adversarial examples (x') that satisfy F(x') != F(x). Is it correct? If so, how can you guarantee the semantic preservation of x', I mean how can you guarantee that x' has the same label with x? If we can guarantee this, why need after-attack accuracy? The more the adversarial examples are generated, the lower the after-attack accuracy will be.

Thanks.

qiyanjun commented 2 years ago

@zhujiangang Please read this: https://textattack.readthedocs.io/en/latest/2notebook/2_Constraints.html

https://textattack.readthedocs.io/en/latest/1start/attacks4Components.html#constraints