I ran the script preprocessing/process_ere.py and I discovered that the amount of sentences in train.w1.oneie.json (12977) is not as same as the paper claimed (14736). And of course, I cannot reproduce the F1 score result on the ERE-EN dataset.
So I looked into this script and in line 1336, it just ignored all the data in dataset 'normal'. However, if I changed to os.path.join(input_dir, 'source', 'cmptxt', '*', '*.txt')). An error occurs when processing this line entity.char_offsets_to_token_offsets(tokens), only a few docs. Ignoring all errors, I got 18895, but still not the same.
I ran the script
preprocessing/process_ere.py
and I discovered that the amount of sentences intrain.w1.oneie.json
(12977) is not as same as the paper claimed (14736). And of course, I cannot reproduce the F1 score result on the ERE-EN dataset.So I looked into this script and in line 1336, it just ignored all the data in dataset 'normal'. However, if I changed to
os.path.join(input_dir, 'source', 'cmptxt', '*', '*.txt'))
. An error occurs when processing this lineentity.char_offsets_to_token_offsets(tokens)
, only a few docs. Ignoring all errors, I got 18895, but still not the same.