Closed wjczf123 closed 3 years ago
There was a problem with data processing while I was running. Do I need to process the data first? Can you tell me how to process the data? How can I see the file of data_processing? I am looking forward to your help. Thank you.
Thanks for your question. The pkl files are too large to upload. As mentioned in the paper Section 5.1, we used bert-as-service to process the data. The data_processing script is in the data_processing folder. You may use ReviewRebuttalnew2.txt for pre-processing. Kindly let me know if there's any question. I will write a more detailed data processing steps in a few days.
OK. Thank you a lot. I am looking forward to your data processing.
Can I use dataProcessing.py to process ReviewRebuttalnew2.txt and get train.pkl? Then i can run this code.
Yes you can. You may need https://github.com/hanxiao/bert-as-service as well.
OK. I will run it. Thank you!
Thank you a lot. I got vec_train.pkl. Do you use BERT-base to get the embedding?
IndexError Traceback (most recent call last)
/deepo_data/CZF/APE/trainer.py in
/deepo_data/CZF/APE/trainer.py in main() 283 set_seed(opt, conf.seed) 284 --> 285 trains = reader.read_txt(conf.train_file, conf.train_num) 286 devs = reader.read_txt(conf.dev_file, conf.dev_num) 287 tests = reader.read_txt(conf.test_file, conf.test_num)
/deepo_data/CZF/APE/config/reader.py in read_txt(self, file, number) 92 (new_index>1 and f[line_idx-2].rstrip().split('\t')[1][0] != 'O') or 93 (f[line_idx+1]!='' and f[line_idx+1].rstrip().split('\t')[1][0] != 'O') or ---> 94 (f[line_idx+1]!='' and f[line_idx+2]!='' and f[line_idx+2].rstrip().split('\t')[1][0] != 'O' ))): 95 reply_idx.append(sent_idx) 96 elif line_idx==len(f)-2:
IndexError: list index out of range
The data processing has some problem.
Yes, I used BERT-base-cased. What's size of your vec_train.pkl? It should be around 15G. When you use bert as service, did you use pooling strategy NONE?
0.5G. It should be. I have no special setting of pooling strategy.
Are there any other settings such as max_seq_len? Can you tell me your instruction of bert-serving?
I am looking forward to your reply.
Sorry for late reply. No max_seq_len. No other settings. I will upload a detailed data processing step tonight.
OK. Thank you a lot.
I have updated the data processing details in README.md in data folder. Kindly let me know if there's any further question.
Thank you a lot. I run it again.
using GPU... 0 Reading file: data/rr-passage/train.txt FileNotFoundError Traceback (most recent call last) /deepo_data/APE/trainer.py in 314 315 if name == "main": --> 316 main()
/deepo_data/APE/trainer.py in main() 283 set_seed(opt, conf.seed) 284 --> 285 trains = reader.read_txt(conf.train_file, conf.train_num) 286 devs = reader.read_txt(conf.dev_file, conf.dev_num) 287 tests = reader.read_txt(conf.test_file, conf.test_num)
/deepo_data/APE/config/reader.py in read_txt(self, file, number) 27 28 # f_vec = open(file[:8]+'vec_test.pkl', 'rb') ---> 29 fvec = open(file[:8] + 'vec' + file[8:-3] + 'pkl', 'rb') 30 all_vecs = pickle.load(f_vec) 31 f_vec.close
FileNotFoundError: [Errno 2] No such file or directory: 'data/rr-vec_passage/train.pkl'