PlusLabNLP / DEGREE

Code for our NAACL-2022 paper DEGREE: A Data-Efficient Generation-Based Event Extraction Model.
Apache License 2.0
74 stars 12 forks source link

About preprocessing ERE-EN dataset #13

Closed rt18 closed 1 year ago

rt18 commented 1 year ago

The problem may be a little stupid... when running preprocess script of ERE-EN dataset, the terminal prints

Skip: 0

Skip: 0

Skip: 0

missing!! 635bde2afdaaf20a0bcdc3b5f79578c9 missing!! 635bde2afdaaf20a0bcdc3b5f79578c9 missing!! 635bde2afdaaf20a0bcdc3b5f79578c9 missing!! 635bde2afdaaf20a0bcdc3b5f79578c9 missing!! 635bde2afdaaf20a0bcdc3b5f79578c9 missing!! 635bde2afdaaf20a0bcdc3b5f79578c9 Processed 109 number of instances Processed 228 number of instances Processed 419 number of instances Processed 701 number of instances Processed 1536 number of instances Processed 2842 number of instances Processed 4377 number of instances Processed 7686 number of instances Processed 11200 number of instances ` It seems that the doc_id is not included in the train/dev/test.doc.txt, should I ignore it or add it to the doc?

ihungalexhsu commented 1 year ago

Hi please just ignore it. We do not include all document in order to match the document usage from OneIE.

rt18 commented 1 year ago

Hi please just ignore it. We do not include all document in order to match the document usage from OneIE.

I got it! Thanks 🙏