BaiDingHub / ABP

0 stars 0 forks source link

Error running the Code #1

Open SachinVashisth opened 3 weeks ago

SachinVashisth commented 3 weeks ago

Hi

I have a few doubts related to the code: Doubt 1 I downloaded the SST dataset and put it into the directory ./data/dataset/sst/SST-2/. This SST-2 folder contains dev.tsv, test.tsv, and train.tsv, and a folder original. I got this SST-2 folder after unzipping the SST-2.zip file.

But when I run the command sh scripts/abp/bert_sst.sh, then it gives me IsADirectoryError: error in line 40 of testdata_loader.py which is with open(path, encoding=encoding) as fin:. This is because it is expecting the path variable to be a datafile but in the config file bert_sst.yaml, only the dataset_path is given as shown:

dataset_path: './data/dataset/sst/sst'

Actually, when I run the script, then first it goes to line 52 in the attack.py file which is: texts, labels = data_loader.read_corpus(config.AdvDataset['dataset_path'], csvf=False)

Now, this config.AdvDataset['dataset_path'] contains only the path, not the file name. From here, it goes to read_corpus(...) function of the class testddata_loader.py and shows the error in line 40.

Also, when it goes to read_corpus(...) function, one of the arguments is MR=True which should become false as I am not loading the Movie Reviews dataset but the SST dataset.

Doubt 2 In the config file bert_sst.yaml, bert model pretrained_dir: is given as ./data/model/bert/sst but there is no directory sst in the zip file from the link given the Readme file.

Can you please help me resolving these errors?

BaiDingHub commented 3 weeks ago

I am sorry for this. You can first run the read_train_text function in traindata_loader.py like line 171 in train_classifier .py. Then you can get the sst file for test and attack

SachinVashisth commented 3 weeks ago

Hi, thanks for the response.

But I didn't fully understand. should I run train_classifier.py file before the script sh scripts/abp/bert_sst.sh?

Also, can you please tell me the correct directory structure for the SST dataset (and also other datasets) because in the train_classifier.py, the default dataset path for the SST dataset is given as ./data/sst/sst while in readme, it is given as data/dataset/sst?

Currently, I am using this directory structure for the SST dataset: ./data/dataset/sst/SST-2/ where SST-2 folder contains dev.tsv, test.tsv, and train.tsv. Is it correct or should I change it?

BaiDingHub commented 2 weeks ago

All data should be placed in the folder ./data/dataset/, and the decompressed files of the SST-2 dataset should be placed in the folder ./data/dataset/sst/, such as ./data/dataset/sst/train.tsv. After the read_train_text function, the test data sst will be generated, and its path is ./data/dataset/sst/sst.