This repository contains code to reproduce results from the paper:
Query-Efficient Textual Adversarial Example Generation for Black-Box Attacks
There are three datasets used in our experiments including MR
, SST
, IMDB
.
MR
: You can download the MR
dataset (i.e., rt-polaritydata.tar.gz
) into the directory ./data/dataset/mr
from https://www.cs.cornell.edu/people/pabo/movie-review-data/.IMDB
: You can donwnload the IMDB
dataset (i.e., aclImdb_v1.tar.gz
) into the directory ./data/dataset/imdb
from https://ai.stanford.edu/~amaas/data/sentiment/.SST
: You can download the SST
dataset (i.e., SST-2.zip
) into the directory ./data/dataset/sst
from https://dl.fbaipublicfiles.com/glue/data/SST-2.zip.Then please first run the read_train_text
function in traindata_loader.py
like line 171
in train_classifier.py
. Then you can get the dataset file for test and attack.
For each dataset, we randomly sample 1,000 texts from the corresponding testset. In which, the sampled texts in IMDB
are adopted from the github repo of HLBB. You could download and place the dataset into the corresponding directory (i.e., ./data/ag/
). And the sampled texts in MR
and SST
are generated using the read_train_text()
function in the data_loader/traindata_loader.py
file.
We adopt the pretrained models provided by HLBB, including BERT, WordLSTM. You could put these pretrained models BERT
and WordLSTM
into the directory ./data/model/bert
, ./data/model/WordLSTM
, respectively.
Apart from that, we train WordLSTM
model on SST
dataset and ALBERT
model on all three datasets using train_classifier.py
and also place the pretrained models into the directory ./data/model/WordLSTM
, ./data/model/ALBERT
, respectively. And we adopt the BERT
model trained on MR
dataset to predict the text on SST
dataset.
There are three dependencies for this project. Download and put glove.6B.200d.txt
to the directory /data/embedding
. And put counter-fitted-vectors.txt
and the top synonym file mat.txt
to the directory ./data/aux_files
.
abp_generate.py
: Generate ABP for text classification.abp_ensemble.py
: Ensemble the weights generated by ABP on different models or domains.attack.py
: Attack the target model for text classification with ABP.config.py
: Parameters of attack for all datasets.train_classifier.py
: Train the WordCNN or WordLSTM model../adv_method
: Implementation for our ABP../data
: Dataset, embedding matrix and various aux files../model_loader
: Target model, including BERT, WordCNN and WordLSTM../utils
: Helper functions for building dictionaries, loading data, and processing embedding matrix etc../parameter
: All hyper-parameters of our ABP for various target models and datasets in our main experiments../scripts
: Commands to run the attack.Taking the ABP attack on BERT using MR dataset for example, you could run the following command:
sh scripts/abp/bert_mr.sh
You could change the hyper-parameters of ABP in the ./parameter/cnn_mr.yaml
if necessary.