Open EmanuelaBoros opened 3 years ago
One assumption that I have is that the performance is computed and reported on dev, instead of test. Any updates on the script? Thanks!
We achieved 96.5+ F1 score on the dev set and 93.30 F1 on the test set. Please use our released data files link for CoNLL2003. Many thanks !
We achieved 96.5+ F1 score on the dev set and 93.30 F1 on the test set. Please use our released data files link for CoNLL2003. Many thanks !
I could not achieve the best F1 score using the Conll2003 dataset you released. In stead the best reasult I got is 91.71 F1 score. I think the reason is uncorrect hyperparameter set. Due to the limitation of computing power, I can not achieve the best result in the paper. So could you please share the reproduce script for CONLL2003 also? Thanks very much.
Hello, @xiaoya-li, my results are similar to @Lilin-whale with the your released data files for CoNLL2003. Any updates on the script?
I found that the dataset is different from the original conll data. Did you do extra preprocessing? I found there are some modifications like lowercasing some letters
Please use ./scripts/mrc_ner/reproduce/conll03.sh for reproduing our experimental results. MRC-NER format datasets for CoNLL03 are available at link Thanks.
Hi @xiaoya-li, I found a similar issue with @shizhediao. The conll2003 dataset in the provided link is different from the originally released version. The differences are not about the formatting, but the total number of tags and also lowercase letters. May I ask which version of conll2003 you used?
@EmanuelaBoros I have met the same problem, do you solved this problem later? I found conll03 datatset have some problems.
I was not able to reproduce the results reported in the ACL paper for the CoNLL 2003. Would it be possible to share the reproduce script for this dataset also? Thanks.