facebookresearch / text-adversarial-attack

Repo for arXiv preprint "Gradient-based Adversarial Attacks against Text Transformers"
Other
96 stars 11 forks source link

Question about implementation #3

Closed bbuing9 closed 2 years ago

bbuing9 commented 2 years ago

Hi, thanks for the really nice work!

Now I tried to apply your method to my personal project, but there are some issues.

First, regarding log_perplexity, you cut off the last word in logits and the first word in coeffs as follow: shift_logits = logits[:, :-1, :].contiguous() shift_coeffs = coeffs[:, 1:, :].contiguous() Is there any specific rationale behind this operation?

Next, I want to apply this approach with RoBERTa classifier, but it's limited due to different tokenization schemes. Is there any recommendation for this?

Overall, really nice implementation which is easy to follow. Thanks for your effort!

cg563 commented 2 years ago

Thanks for your interest in our work!

About log_perplexity, the shift is done because the language model's task is to predict the next token given the current token. If you look at https://github.com/facebookresearch/text-adversarial-attack/blob/main/whitebox_attack.py#L216, perplexity is computed using pred.logits as the predicted probability and coeffs as the ground truth, so we need to shift the two vectors to match the current token in pred.logits with the next token in coeffs.

About the RoBERTa classifier, it is possible but there's a bit of extra work to be done. Basically to use the perplexity constraint, you would need to train a causal language model (e.g. GPT-2) using the same tokenizer as the RoBERTa model. We only provided such a model for the BERT tokenizer (see https://github.com/facebookresearch/text-adversarial-attack#21-downloading-gpt-2-trained-on-bert-tokenizer-optional) because we did not evaluate our white-box attack on a masked language model other than BERT.

Hope this helps.