Closed wujsAct closed 7 years ago
@wujsAct , thank you for pointing out this issue. We found in the previous data processing pipeline the library we used failed to deal with some special characters and led to such problem. It happened only to a very minor part of the dataset. In previous datasets, these gold-standard mentions with negative indexes will not be included in the evaluation; and they will be included after fixing the issue. We have updated the dataset and its download link.
Why bbn test.json has the negative index for entity mention. The following is an example. {"tokens": ["In", "1973", ",", "Wells", "Fargo", "&", "amp", ";", "Co.", "of", "San", "Francisco", "launched", "the", "Gold", "Account", ",", "which", "included", "free", "checking", ",", "a", "credit", "card", ",", "safe-deposit", "box", "and", "travelers", "checks", "for", "a", "$", "3", "monthly", "fee", "."], "senid": 24, "mentions": [{"start": -1, "labels": ["/ORGANIZATION/CORPORATION", "/ORGANIZATION"], "end": -1}, {"start": 10, "labels": ["/GPE/CITY", "/GPE"], "end": 12}], "fileid": "WSJ0085"}