question about data process

wjczf123 commented 3 years ago

using GPU... 0 Reading file: data/rr-passage/train.txt FileNotFoundError Traceback (most recent call last) /deepo_data/APE/trainer.py in 314 315 if name == "main": --> 316 main()

/deepo_data/APE/trainer.py in main() 283 set_seed(opt, conf.seed) 284 --> 285 trains = reader.read_txt(conf.train_file, conf.train_num) 286 devs = reader.read_txt(conf.dev_file, conf.dev_num) 287 tests = reader.read_txt(conf.test_file, conf.test_num)

/deepo_data/APE/config/reader.py in read_txt(self, file, number) 27 28 # f_vec = open(file[:8]+'vec_test.pkl', 'rb') ---> 29 fvec = open(file[:8] + 'vec' + file[8:-3] + 'pkl', 'rb') 30 all_vecs = pickle.load(f_vec) 31 f_vec.close

FileNotFoundError: [Errno 2] No such file or directory: 'data/rr-vec_passage/train.pkl'

wjczf123 commented 3 years ago

There was a problem with data processing while I was running. Do I need to process the data first? Can you tell me how to process the data? How can I see the file of data_processing? I am looking forward to your help. Thank you.

LiyingCheng95 commented 3 years ago

Thanks for your question. The pkl files are too large to upload. As mentioned in the paper Section 5.1, we used bert-as-service to process the data. The data_processing script is in the data_processing folder. You may use ReviewRebuttalnew2.txt for pre-processing. Kindly let me know if there's any question. I will write a more detailed data processing steps in a few days.

wjczf123 commented 3 years ago

OK. Thank you a lot. I am looking forward to your data processing.

wjczf123 commented 3 years ago

Can I use dataProcessing.py to process ReviewRebuttalnew2.txt and get train.pkl? Then i can run this code.

LiyingCheng95 commented 3 years ago

Yes you can. You may need https://github.com/hanxiao/bert-as-service as well.

wjczf123 commented 3 years ago

OK. I will run it. Thank you!

wjczf123 commented 3 years ago

Thank you a lot. I got vec_train.pkl. Do you use BERT-base to get the embedding?

After i got ver_train.pkl, I got this bug:

IndexError Traceback (most recent call last) /deepo_data/CZF/APE/trainer.py in 314 315 if name == "main": --> 316 main()

/deepo_data/CZF/APE/trainer.py in main() 283 set_seed(opt, conf.seed) 284 --> 285 trains = reader.read_txt(conf.train_file, conf.train_num) 286 devs = reader.read_txt(conf.dev_file, conf.dev_num) 287 tests = reader.read_txt(conf.test_file, conf.test_num)

/deepo_data/CZF/APE/config/reader.py in read_txt(self, file, number) 92 (new_index>1 and f[line_idx-2].rstrip().split('\t')[1][0] != 'O') or 93 (f[line_idx+1]!='' and f[line_idx+1].rstrip().split('\t')[1][0] != 'O') or ---> 94 (f[line_idx+1]!='' and f[line_idx+2]!='' and f[line_idx+2].rstrip().split('\t')[1][0] != 'O' ))): 95 reply_idx.append(sent_idx) 96 elif line_idx==len(f)-2:

IndexError: list index out of range

The data processing has some problem.

LiyingCheng95 commented 3 years ago

Yes, I used BERT-base-cased. What's size of your vec_train.pkl? It should be around 15G. When you use bert as service, did you use pooling strategy NONE?

wjczf123 commented 3 years ago

0.5G. It should be. I have no special setting of pooling strategy.

wjczf123 commented 3 years ago

Are there any other settings such as max_seq_len? Can you tell me your instruction of bert-serving?

wjczf123 commented 3 years ago

I am looking forward to your reply.

LiyingCheng95 commented 3 years ago

Sorry for late reply. No max_seq_len. No other settings. I will upload a detailed data processing step tonight.

wjczf123 commented 3 years ago

OK. Thank you a lot.

LiyingCheng95 commented 3 years ago

I have updated the data processing details in README.md in data folder. Kindly let me know if there's any further question.

wjczf123 commented 3 years ago

Thank you a lot. I run it again.

LiyingCheng95 / ArgumentPairExtraction

question about data process #2

After i got ver_train.pkl, I got this bug: