Question about implementation

facebookresearch / text-adversarial-attack

Repo for arXiv preprint "Gradient-based Adversarial Attacks against Text Transformers"

Other

96 stars 11 forks source link

Thanks for your interest in our work!

About log_perplexity, the shift is done because the language model's task is to predict the next token given the current token. If you look at https://github.com/facebookresearch/text-adversarial-attack/blob/main/whitebox_attack.py#L216, perplexity is computed using pred.logits as the predicted probability and coeffs as the ground truth, so we need to shift the two vectors to match the current token in pred.logits with the next token in coeffs.

About the RoBERTa classifier, it is possible but there's a bit of extra work to be done. Basically to use the perplexity constraint, you would need to train a causal language model (e.g. GPT-2) using the same tokenizer as the RoBERTa model. We only provided such a model for the BERT tokenizer (see https://github.com/facebookresearch/text-adversarial-attack#21-downloading-gpt-2-trained-on-bert-tokenizer-optional) because we did not evaluate our white-box attack on a masked language model other than BERT.

Hope this helps.

facebookresearch / text-adversarial-attack

Question about implementation #3