Task 2 : Types Evaluation

lara-hdr commented 8 years ago

For the second task, "the system answer will be counted correct as long as at least one of the possibility is returned". I would like to know how will the types be evaluated if the system returns more than a type? Is there a penalty if one of the returned types does not match with one of the golden standard types?

Thank you, Lara

anuzzolese commented 8 years ago

An entity can have multiple types in the goldstandard. A system should provide at least one correct type for an entity. However, all wrong types returned by a system for an entity will count as errors.

rtroncy commented 8 years ago

Let's take a concrete example: let's imagine that an entity has been annotated with three types in the gold standard. Will the scorer provide a different result if a system A provide just 1 correct type (and no incorrect type) versus a system B provide the 3 correct types (+1 incorrect type)?

MichaelRoeder commented 8 years ago

Hi all,

task 2 can be divided into 2 subtasks. The tasks are evaluated independently and the F1-score of the complete task is the average of the F1-scores of the single subtasks.

1. Find the string that describes the entity type The example in the task description shows that it is not always possible to clearly judge whether a word (especially adjective) is part of the entity type or not. Thus, there are different possibilities which string could be marked. It is sufficient to mark one of these possibilities to get a true positive. However, no approach should mark more than one possible entity type per document.

2. Map the type that has been found to the subset of DOLCE+DnS Ultra Lite classes For evaluating the entity typing subtask the hierarchical F-measure is used. It is important that all types mentioned in the gold standard are present. Missing types as well as additional types are counted as errors. Regarding the concrete example, it is not easy to say which result the two answers will have because the hierarchical F-measure depends on the type hierarchy as well as on the positions of the single types inside the hierarchy. However, since in the most cases the types will be leave nodes of the hierarchy, we can assume this for this example. In this case the hierarchical F-measure will perform like the normal F-measure. Thus, the first annotator would have 1 true positive and 2 false negative while the second annotator would have 3 true positives and 1 false positive.

rtroncy commented 8 years ago

Thanks for the clarifications.

lara-hdr commented 8 years ago

Thanks!

anuzzolese / oke-challenge-2016

Task 2 : Types Evaluation #7