Closed debayan closed 4 years ago
Can you point to an example? Ideally it should be the case.
In terms of the implementation, if it encounters an example without annotations, then it treats it like you should have predicted nothing.
For example in webquestions test set we have the following
{"question_id": "WebQTest-3", "utterance": "who plays ken barlow in coronation street?", "entities_fb": ["m.015lwh", "m.01_2n"], "entities": [null, "Q945030"], "main_entity_text": "Coronation Street", "main_entity": "Q945030", "main_entity_fb": "m.01_2n", "main_entity_tokens": "coronation street", "main_entity_pos": [24, 41], "entity_classes": [null, "product"]}
See the null in entities. I think this should be skipped during f1 calculation with no penalties whatsoever.
And then we also have {"question_id": "WebQTest-521", "utterance": "who was anakin skywalker?", "entities_fb": [], "entities": []
which is a different case, but you already explained what you do here.
i also wanted to know, what kind of text search do you use for the n-grams? Do you try edit distance? And how many candidates do you consider for text search for each n-gram. I ask because I am getting low recall when using top 30 on elasticsearch.
The first case is clearly the problem of mapping from FB too Wikidata, for some FB entities there were just no information, so we kept them as null. I think for F1 calculation we actually included them because we compared against some systems that use FB instead of Wikidata. This did put our system at a disadvantage, of course.
For the text search we used the CONTAINS method from the Virtuoso which is arguably not the best option. It checks if there is an entity label in the data base that contains the search query. I have experimented with edit distance at some point but it introduced more noise than useful information. We kept 50 candidates.
Ok thanks a lot, closing this issue.
Hi,
In the test datasets there are some sentences which have no entities, or null entities. How do you handle this during evaluation?