For instance, like "一条百里江堤逶迤长江下游南岸,横贯沿江11个乡镇、1个国营场圃,成为张家港市一道新的风景线。" and "四、龙宫洞在九江市彭泽县境内,距庐山约200公里,亦不属庐山管理局管辖范围。". I did not check all samples very carefully, but this discovery was really shocking to me. Are there several different versions of MSRA datasets? Or perhaps there exist some mistakes?
Hi, thanks again for realizing the code. However, I am confused about the MSRA dataset you provided in the link https://drive.google.com/file/d/1bAoSJfT1IBdpbQWSrZPjQPPbAsDGlN2D/view?usp=sharing. I downloaded the most popular version of the MSRA NER dataset from https://github.com/InsaneLife/ChineseNLPCorpus/tree/master/NER/MSRA. Surprisingly, I found that some samples in my downloaded one do not appear in your preprocessed dataset.
For instance, like "一条百里江堤逶迤长江下游南岸,横贯沿江11个乡镇、1个国营场圃,成为张家港市一道新的风景线。" and "四、龙宫洞在九江市彭泽县境内,距庐山约200公里,亦不属庐山管理局管辖范围。". I did not check all samples very carefully, but this discovery was really shocking to me. Are there several different versions of MSRA datasets? Or perhaps there exist some mistakes?