ShannonAI / mrc-for-flat-nested-ner

Code for ACL 2020 paper `A Unified MRC Framework for Named Entity Recognition`
659 stars 118 forks source link

Reproducing the CoNLL2003 Results #77

Open EmanuelaBoros opened 3 years ago

EmanuelaBoros commented 3 years ago

I was not able to reproduce the results reported in the ACL paper for the CoNLL 2003. Would it be possible to share the reproduce script for this dataset also? Thanks.

EmanuelaBoros commented 3 years ago

One assumption that I have is that the performance is computed and reported on dev, instead of test. Any updates on the script? Thanks!

xiaoya-li commented 3 years ago

We achieved 96.5+ F1 score on the dev set and 93.30 F1 on the test set. Please use our released data files link for CoNLL2003. Many thanks !

lin-whale commented 3 years ago

We achieved 96.5+ F1 score on the dev set and 93.30 F1 on the test set. Please use our released data files link for CoNLL2003. Many thanks !

I could not achieve the best F1 score using the Conll2003 dataset you released. In stead the best reasult I got is 91.71 F1 score. I think the reason is uncorrect hyperparameter set. Due to the limitation of computing power, I can not achieve the best result in the paper. So could you please share the reproduce script for CONLL2003 also? Thanks very much.

EmanuelaBoros commented 3 years ago

Hello, @xiaoya-li, my results are similar to @Lilin-whale with the your released data files for CoNLL2003. Any updates on the script?

shizhediao commented 3 years ago

I found that the dataset is different from the original conll data. Did you do extra preprocessing? I found there are some modifications like lowercasing some letters

xiaoya-li commented 3 years ago

Please use ./scripts/mrc_ner/reproduce/conll03.sh for reproduing our experimental results. MRC-NER format datasets for CoNLL03 are available at link Thanks.

xuuuluuu commented 2 years ago

Hi @xiaoya-li, I found a similar issue with @shizhediao. The conll2003 dataset in the provided link is different from the originally released version. The differences are not about the formatting, but the total number of tags and also lowercase letters. May I ask which version of conll2003 you used?

Senwang98 commented 2 years ago

@EmanuelaBoros I have met the same problem, do you solved this problem later? I found conll03 datatset have some problems.