ShannonAI / mrc-for-flat-nested-ner

Code for ACL 2020 paper `A Unified MRC Framework for Named Entity Recognition`
643 stars 117 forks source link

Problems regarding MSRA dataset #80

Closed smiles724 closed 3 years ago

smiles724 commented 3 years ago

Hi, thanks again for realizing the code. However, I am confused about the MSRA dataset you provided in the link https://drive.google.com/file/d/1bAoSJfT1IBdpbQWSrZPjQPPbAsDGlN2D/view?usp=sharing. I downloaded the most popular version of the MSRA NER dataset from https://github.com/InsaneLife/ChineseNLPCorpus/tree/master/NER/MSRA. Surprisingly, I found that some samples in my downloaded one do not appear in your preprocessed dataset.

For instance, like "一条百里江堤逶迤长江下游南岸,横贯沿江11个乡镇、1个国营场圃,成为张家港市一道新的风景线。" and "四、龙宫洞在九江市彭泽县境内,距庐山约200公里,亦不属庐山管理局管辖范围。". I did not check all samples very carefully, but this discovery was really shocking to me. Are there several different versions of MSRA datasets? Or perhaps there exist some mistakes?