INK-USC / PLE

Label Noise Reduction in Entity Typing (KDD'16)
GNU General Public License v3.0
53 stars 14 forks source link

bbn entity mention has negative index #1

Closed wujsAct closed 7 years ago

wujsAct commented 7 years ago

Why bbn test.json has the negative index for entity mention. The following is an example. {"tokens": ["In", "1973", ",", "Wells", "Fargo", "&", "amp", ";", "Co.", "of", "San", "Francisco", "launched", "the", "Gold", "Account", ",", "which", "included", "free", "checking", ",", "a", "credit", "card", ",", "safe-deposit", "box", "and", "travelers", "checks", "for", "a", "$", "3", "monthly", "fee", "."], "senid": 24, "mentions": [{"start": -1, "labels": ["/ORGANIZATION/CORPORATION", "/ORGANIZATION"], "end": -1}, {"start": 10, "labels": ["/GPE/CITY", "/GPE"], "end": 12}], "fileid": "WSJ0085"}

shanzhenren commented 7 years ago

@wujsAct , thank you for pointing out this issue. We found in the previous data processing pipeline the library we used failed to deal with some special characters and led to such problem. It happened only to a very minor part of the dataset. In previous datasets, these gold-standard mentions with negative indexes will not be included in the evaluation; and they will be included after fixing the issue. We have updated the dataset and its download link.