PlusLabNLP / DEGREE

Code for our NAACL-2022 paper DEGREE: A Data-Efficient Generation-Based Event Extraction Model.
Apache License 2.0
72 stars 12 forks source link

Preprocessing ERE #16

Open zhou6140919 opened 1 year ago

zhou6140919 commented 1 year ago

I ran the script preprocessing/process_ere.py and I discovered that the amount of sentences in train.w1.oneie.json (12977) is not as same as the paper claimed (14736). And of course, I cannot reproduce the F1 score result on the ERE-EN dataset.

So I looked into this script and in line 1336, it just ignored all the data in dataset 'normal'. However, if I changed to os.path.join(input_dir, 'source', 'cmptxt', '*', '*.txt')). An error occurs when processing this line entity.char_offsets_to_token_offsets(tokens), only a few docs. Ignoring all errors, I got 18895, but still not the same.

ej0cl6 commented 7 months ago

Notice that sometimes package version matters as well. So please check if you packages match ours.