dice-group / gerbil

GERBIL - General Entity annotatoR Benchmark
GNU Affero General Public License v3.0
218 stars 57 forks source link

Strange RT2KB and Typing scores #226

Open jplu opened 6 years ago

jplu commented 6 years ago

Hello,

The RT2KB and Typing process gives strange scores compared to other scorers. Every time I run a RT2KB process, on a NIF dataset, I always get the exact same score for Precision, Recall and F1, which is quite odd (see this example). If I evaluate the same output with two other scorers (neleval and conlleval) I get the same results with both scorers that are much higher than what RT2KB can gives me (P = 0.717, R = 0.765, F1 = 0.740).

The description of RT2KB says "the annotator gets a text and shall recognize the entities inside and their types", consequently I'm curious to know how the three measures can be equals for Typing when they are different for Recognition.

Any light on this would be welcomed :)

Thanks!

MichaelRoeder commented 6 years ago

Thanks for that question. I can only give a general answer since you have uploaded a larger dataset. I think uploading an example with a single document for which the evaluation results are different would give use an easier way of comparing the evaluations :wink:

In general, RT2KB does the following:

  1. it identifies entities that have been recognized correctly (Recognition step)
  2. from these correctly identified entities it takes the types and calculates the hierarchical F-measure for the type. (Errors in the recognition will lead to lower precision/recall in this calculation as well, since expected type information won't be available, etc.)

From the results for these two single steps, you can see that the benchmarked system has got 0.76 F1-measure for each step. So the combination of both can not get more than that and most probably will have a lower F1-measure since correctly identified entities might have got a (partly) wrong type.

However, I would be happy to dig into this when you can provide a single example with the results from the other scorers :smiley:

jplu commented 6 years ago

Thanks @MichaelRoeder I will check with a single document with more specific details on how to reproduce this with Gerbil and the two other scorers ASAP and share it on this thread :)

jplu commented 6 years ago

The results with the conlleval scorer was a happy coincidence, because it does not evaluate by "offset" but by "token", so the way it evaluates the recognition is different. Sorry for that.

However, the neleval scorer has a similar behavior than RT2KB and still proposes a different result over this single document. Here the GERBIL results and here the TAC output (understood by the neleval scorer):

Gold Standard in TAC:

document-75 0   14  NIL0    0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 33  38  http://dbpedia.org/resource/Paris   0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Place
document-75 74  77  NIL0    0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 92  94  NIL0    0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 116 132 NIL1    0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Organization
document-75 136 156 http://dbpedia.org/resource/Thessaloniki    0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Place
document-75 158 161 NIL0    0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 162 168 http://dbpedia.org/resource/Mother  0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Role
document-75 170 184 NIL2    0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 205 212 http://dbpedia.org/resource/Actor   0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Role
document-75 227 241 NIL3    0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person

The equivalent in NIF:

@prefix nif:        <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix itsrdf:     <http://www.w3.org/2005/11/its/rdf#> .
@prefix dul:        <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#> .
@prefix xsd:        <http://www.w3.org/2001/XMLSchema#> .
@prefix dbpedia:    <http://dbpedia.org/resource/> .
@prefix rdf:        <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix oke:        <http://aksw.org/notInWiki/> .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242>
        a               nif:String, nif:RFC5147String, nif:Context ;
        nif:beginIndex  "0"^^xsd:nonNegativeInteger ;
        nif:endIndex    "242"^^xsd:nonNegativeInteger ;
        nif:isString    "Albert Modiano (1912–77, born in Paris), was of Italian Jewish origin; on his paternal side he was descended from a Sephardic family of Thessaloniki, Greece. His mother, Louisa Colpijn (1918-2015), was an actress also known as Louisa Colpeyn."@en .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,14>
        a                       nif:String, nif:RFC5147String, nif:Phrase ;
        nif:anchorOf            "Albert Modiano"@en ;
        nif:beginIndex          "0"^^xsd:nonNegativeInteger ;
        nif:endIndex            "14"^^xsd:nonNegativeInteger ;
        nif:referenceContext    <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taIdentRef       oke:Albert_Modiano ;
        itsrdf:taClassRef       dul:Person ;
        itsrdf:taSource         "DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=33,38>
        a                       nif:String, nif:RFC5147String, nif:Phrase ;
        nif:anchorOf            "Paris"@en ;
        nif:beginIndex          "33"^^xsd:nonNegativeInteger ;
        nif:endIndex            "38"^^xsd:nonNegativeInteger ;
        nif:referenceContext    <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taIdentRef       dbpedia:Paris ;
        itsrdf:taClassRef       dul:Place ;
        itsrdf:taSource         "DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=74,77>
        a                       nif:String, nif:RFC5147String, nif:Phrase ;
        nif:anchorOf            "his"@en ;
        nif:beginIndex          "74"^^xsd:nonNegativeInteger ;
        nif:endIndex            "77"^^xsd:nonNegativeInteger ;
        nif:referenceContext    <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taIdentRef       oke:Albert_Modiano ;
        itsrdf:taClassRef       dul:Person ;
        itsrdf:taSource         "DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=92,94>
        a                       nif:String, nif:RFC5147String, nif:Phrase ;
        nif:anchorOf            "he"@en ;
        nif:beginIndex          "92"^^xsd:nonNegativeInteger ;
        nif:endIndex            "94"^^xsd:nonNegativeInteger ;
        nif:referenceContext    <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taIdentRef       oke:Albert_Modiano ;
        itsrdf:taClassRef       dul:Person ;
        itsrdf:taSource         "DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=116,132>
        a                       nif:String, nif:RFC5147String, nif:Phrase ;
        nif:anchorOf            "Sephardic family"@en ;
        nif:beginIndex          "116"^^xsd:nonNegativeInteger ;
        nif:endIndex            "132"^^xsd:nonNegativeInteger ;
        nif:referenceContext    <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taIdentRef       oke:Sephardi_family ;
        itsrdf:taClassRef       dul:Organization ;
        itsrdf:taSource         "DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=136,156>
        a                       nif:String, nif:RFC5147String, nif:Phrase ;
        nif:anchorOf            "Thessaloniki, Greece"@en ;
        nif:beginIndex          "136"^^xsd:nonNegativeInteger ;
        nif:endIndex            "156"^^xsd:nonNegativeInteger ;
        nif:referenceContext    <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taIdentRef       dbpedia:Thessaloniki ;
        itsrdf:taClassRef       dul:Place ;
        itsrdf:taSource         "DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=158,161>
        a                       nif:String, nif:RFC5147String, nif:Phrase ;
        nif:anchorOf            "His"@en ;
        nif:beginIndex          "158"^^xsd:nonNegativeInteger ;
        nif:endIndex            "161"^^xsd:nonNegativeInteger ;
        nif:referenceContext    <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taIdentRef       oke:Albert_Modiano ;
        itsrdf:taClassRef       dul:Person ;
        itsrdf:taSource         "DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=162,168>
        a                       nif:String, nif:RFC5147String, nif:Phrase ;
        nif:anchorOf            "mother"@en ;
        nif:beginIndex          "162"^^xsd:nonNegativeInteger ;
        nif:endIndex            "168"^^xsd:nonNegativeInteger ;
        nif:referenceContext    <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taIdentRef       dbpedia:Mother ;
        itsrdf:taClassRef       dul:Role ;
        itsrdf:taSource         "DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=170,184>
        a                       nif:String, nif:RFC5147String, nif:Phrase ;
        nif:anchorOf            "Louisa Colpijn"@en ;
        nif:beginIndex          "170"^^xsd:nonNegativeInteger ;
        nif:endIndex            "184"^^xsd:nonNegativeInteger ;
        nif:referenceContext    <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taIdentRef       oke:Louisa_Colpijn ;
        itsrdf:taClassRef       dul:Person ;
        itsrdf:taSource         "DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=205,212>
        a                       nif:String, nif:RFC5147String, nif:Phrase ;
        nif:anchorOf            "actress"@en ;
        nif:beginIndex          "205"^^xsd:nonNegativeInteger ;
        nif:endIndex            "212"^^xsd:nonNegativeInteger ;
        nif:referenceContext    <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taIdentRef       dbpedia:Actor ;
        itsrdf:taClassRef       dul:Role ;
        itsrdf:taSource         "DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=227,241>
        a                       nif:String, nif:RFC5147String, nif:Phrase ;
        nif:anchorOf            "Louisa Colpeyn"@en ;
        nif:beginIndex          "227"^^xsd:nonNegativeInteger ;
        nif:endIndex            "241"^^xsd:nonNegativeInteger ;
        nif:referenceContext    <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taIdentRef       oke:Louisa_Colpeyn ;
        itsrdf:taClassRef       dul:Person ;
        itsrdf:taSource         "DBpedia 2014"^^xsd:string .

System output in TAC:

document-75 170 184 http://dbpedia.org/resource/National_Register_of_Historic_Places_listings_in_Iowa   5.4756873E-7    http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 136 156 http://dbpedia.org/resource/Greece  1.4326925E-5    http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Place
document-75 0   14  http://dbpedia.org/resource/University_of_Chicago   5.789066E-6 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 116 135 http://dbpedia.org/resource/Family_(biology)    3.2394513E-5    http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Organization
document-75 205 212 http://dbpedia.org/resource/Actor   2.6748134E-5    http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Role
document-75 33  39  http://dbpedia.org/resource/Paris   4.2364663E-5    http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Place
document-75 158 161 http://dbpedia.org/resource/Hit_(baseball)  2.1313697E-6    http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 92  94  http://dbpedia.org/resource/Netherlands 1.5448735E-5    http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 74  86  http://dbpedia.org/resource/Rhineland-Palatinate    4.3240807E-6    http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 227 233 http://dbpedia.org/resource/List_of_Animaniacs_characters   4.727223E-7 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Role
document-75 234 241 NILfbc8560d-e7b1-4207-8856-0de7b142075f 0.0 http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75 162 168 http://dbpedia.org/resource/Scotland    2.1532596E-5    http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Role

Here the equivalent NIF output:

@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dul:   <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#> .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=170,184>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Louisa Colpijn" ;
        nif:beginIndex        "170"^^xsd:nonNegativeInteger ;
        nif:endIndex          "184"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Person .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=92,94>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "he" ;
        nif:beginIndex        "92"^^xsd:nonNegativeInteger ;
        nif:endIndex          "94"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Person .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,14>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Albert Modiano" ;
        nif:beginIndex        "0"^^xsd:nonNegativeInteger ;
        nif:endIndex          "14"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Person .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=136,156>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Thessaloniki, Greece" ;
        nif:beginIndex        "136"^^xsd:nonNegativeInteger ;
        nif:endIndex          "156"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Place .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=33,39>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Paris)" ;
        nif:beginIndex        "33"^^xsd:nonNegativeInteger ;
        nif:endIndex          "39"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Place .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=234,241>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Colpeyn" ;
        nif:beginIndex        "234"^^xsd:nonNegativeInteger ;
        nif:endIndex          "241"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Person .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=158,161>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "His" ;
        nif:beginIndex        "158"^^xsd:nonNegativeInteger ;
        nif:endIndex          "161"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Person .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=205,212>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "actress" ;
        nif:beginIndex        "205"^^xsd:nonNegativeInteger ;
        nif:endIndex          "212"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Role .

<http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242>
        a               nif:String , nif:RFC5147String , nif:Context ;
        nif:beginIndex  "0"^^xsd:nonNegativeInteger ;
        nif:endIndex    "242"^^xsd:nonNegativeInteger ;
        nif:isString    "Albert Modiano (1912–77, born in Paris), was of Italian Jewish origin; on his paternal side he was descended from a Sephardic family of Thessaloniki, Greece. His mother, Louisa Colpijn (1918-2015), was an actress also known as Louisa Colpeyn."@en .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=74,86>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "his paternal" ;
        nif:beginIndex        "74"^^xsd:nonNegativeInteger ;
        nif:endIndex          "86"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Person .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=227,233>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Louisa" ;
        nif:beginIndex        "227"^^xsd:nonNegativeInteger ;
        nif:endIndex          "233"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Role .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=162,168>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "mother" ;
        nif:beginIndex        "162"^^xsd:nonNegativeInteger ;
        nif:endIndex          "168"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Role .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=116,135>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Sephardic family of" ;
        nif:beginIndex        "116"^^xsd:nonNegativeInteger ;
        nif:endIndex          "135"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Organization .

The scorer is available here and the command line to run the evaluation is:

./nel evaluate -m strong_typed_mention_match -f tab -g gold_standard.tac system_output.tac

And here the output I get:

ptp     fp      rtp     fn      precis  recall  fscore  measure
7       5       7       4       0.583   0.636   0.609   strong_typed_mention_match
rtroncy commented 6 years ago

It seems to me that the neleval output:

ptp     fp      rtp     fn      precis  recall  fscore  measure
7       5       7       4       0.583   0.636   0.609   strong_typed_mention_match

corresponds to the "Entity Recognition" score provided by GERBIL at http://gerbil.aksw.org/gerbil/experiment?id=201711270022

However, the strong_typed_mention_match SHOULD correspond to the "Entity Typing". Is this the issue?

jplu commented 6 years ago

No it should correspond to the first line where 0,4375 | 0,4375 | 0,4375 is written. Entity Typing is something else.

Basically "strong_typed_mention_match" in neleval == "RT2KB" in GERBIL and "strong_mention_match" in neleval == "Entity Recognition" in GERBIL.

The example I gave is a case where the score of extraction is equal to the score of recognition because all the 7 (up to 11 in total) correct extracted mentions have their proper type attached. Look at the "TP", "FN" and "FP" values they are equals:

./nel evaluate -m strong_mention_match -f tab -g gold_standard.tac system_output.tac
ptp     fp      rtp     fn      precis  recall  fscore  measure
7       5       7       4       0.583   0.636   0.609   strong_mention_match
./nel evaluate -m strong_typed_mention_match -f tab -g gold_standard.tac system_output.tac
ptp     fp      rtp     fn      precis  recall  fscore  measure
7       5       7       4       0.583   0.636   0.609   strong_mention_match
MichaelRoeder commented 6 years ago

@jplu thanks for this example. Going through it manually, I have calculated the same result as the nel-eval script.

GS start GS length GS URI GS type Sys start Sys length Sys type Erec Matching hier. Matching
0 14 aksw:Albert_Modiano dul:Person 0 14 dul:Person tp tp
33 5 dbr:Paris dul:Place 33 6 dul:Place fp, fn fp, fn
74 3 aksw:Albert_Modiano dul:Person 74 12 dul:Person fp, fn fp, fn
92 2 aksw:Albert_Modiano dul:Person 92 2 dul:Person tp tp
116 16 aksw:Sephardi_family dul:Organization 116 19 dul:Organization fp, fn fp, fn
136 20 dbr:Thessaloniki dul:Place 136 20 dul:Place tp tp
158 3 aksw:Albert_Modiano dul:Person 158 3 dul:Person tp tp
162 6 dbr:Mother dul:Role 162 6 dul:Role tp tp
170 14 aksw:Louisa_Colpijn dul:Person 170 14 dul:Person tp tp
205 7 dbr:Actor dul:Role 205 7 dul:Role tp tp
227 14 aksw:Louisa_Colpeyn dul:Person 227 6 dul:Role fp, fn fp, fn
--- --- --- --- 234 7 dul:Person fp fp

These numbers lead to precision=0.583, recall=0.636 and F1-score=0.609.

So what I gathered so far is that GERBIL identifies the cases as they are described in the table above. However, the numbers that are calculated based on these counts are not correct. We will search for the problem and update GERBIL.

jplu commented 6 years ago

Thanks @MichaelRoeder! Let me know once the bug will be fixed.

TortugaAttack commented 6 years ago

Hi,

sorry it took me so long. Much todo right now.

Is there an open endpoint or could you provide me the ADEL webservice url? (here or DM) It would be much easier for me to check against the actual WebService.

MichaelRoeder commented 6 years ago

@TortugaAttack I have reproduced the problem using the two NIF files listed above. You can use the FileBasedNIFDataset for loading the data and the InstanceListBasedAnnotator to load the result file of the annotator and simulate the behaviour of an annotator (you have to make sure that the URIs of the documents in both files are the same - I think the annotator result NIF above has a different URI for the document, the needs to be replaced).

Based on that, you should add a JUnit test (you can copy and adapt the SingleRunTest for that).

TortugaAttack commented 6 years ago

Well. I found a problem. In Hier:

If the annotator provides wrong results: f.e.:

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=33,39>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Paris)" ;
        nif:beginIndex        "33"^^xsd:nonNegativeInteger ;
        nif:endIndex          "39"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taClassRef     dul:Place .

It will be count as tp=0, fp=1, fn=0 By removing the part where annotations not in the golden std. the results matches yours.

I guess it is debatable here if ETyping should acknowledge Recognition too. I can remove it and everything would be matching the results. or let in there and we should provide this information in the wiki. Unit test will be changed accroding on what it should be

MichaelRoeder commented 6 years ago

I do not see how this solves the issue since we have to count it as a false positive - as it is done in the table above as well. However, if it solved the problem for you, it might be possible that we count it twice... right?

TortugaAttack commented 6 years ago

no it is not done in the table above. In the table above you havve the 11 entities which are in the golden std. (and one with -- i am not sure what you mean by that) In Gerbil we have currently 16. 11 Golden std (which are counted correct according to the table) + 5 from the annotator which are not in the golden std.

Again: F.e.

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=33,39>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Paris)" ;
        nif:beginIndex        "33"^^xsd:nonNegativeInteger ;
        nif:endIndex          "39"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taClassRef     dul:Place .

Is not counted in your table.
If we ignore those entities not in the golden std. we get the results you calculated. If not we get other results.

MichaelRoeder commented 6 years ago

The table is structured by the gold std. entities (11) and the entities of the system answers (12) mapped to them. The last system answer does not match any gold standard (that is the reason for the ---). Apart from that, there are 4 entities from the system that are not exactly matching the gold std (like the "Paris)" example you described). So the table does contain 16 distinct entities :wink:

MichaelRoeder commented 6 years ago
  1. We fixed a bug in the hierarchical F1-measure counting that could lead to doubling the number of fp counts.

  2. Apart from that, there is a misunderstanding in the calculation of the hierarchical F-measure and the table that I posted before shows exactly the misunderstanding: when evaluating the results of an annotation system, the evaluation can not match "Paris" and "Paris)" as we have done it in the table above. For a human we would automatically put them in the same line but for the evaluation, these two entities are different and have to be handled separatetly. Thus, the updated table looks like the following.

GS start GS length GS URI GS type Sys start Sys length Sys type Erec Matching hier. Matching hier. prec hier. recall hier. F1
0 14 aksw:Albert_Modiano dul:Person 0 14 dul:Person tp tp 1.0 1.0 1.0
33 5 dbr:Paris dul:Place --- --- --- fn fn 0.0 0.0 0.0
74 3 aksw:Albert_Modiano dul:Person 74 12 dul:Person fn fn 0.0 0.0 0.0
92 2 aksw:Albert_Modiano dul:Person 92 2 dul:Person tp tp 1.0 1.0 1.0
116 16 aksw:Sephardi_family dul:Organization 116 19 dul:Organization fn fn 0.0 0.0 0.0
136 20 dbr:Thessaloniki dul:Place 136 20 dul:Place tp tp 1.0 1.0 1.0
158 3 aksw:Albert_Modiano dul:Person 158 3 dul:Person tp tp 1.0 1.0 1.0
162 6 dbr:Mother dul:Role 162 6 dul:Role tp tp 1.0 1.0 1.0
170 14 aksw:Louisa_Colpijn dul:Person 170 14 dul:Person tp tp 1.0 1.0 1.0
205 7 dbr:Actor dul:Role 205 7 dul:Role tp tp 1.0 1.0 1.0
227 14 aksw:Louisa_Colpeyn dul:Person 227 6 dul:Role fn fn 0.0 0.0 0.0
--- --- --- --- 33 6 dul:Place fp fp 0.0 0.0 0.0
--- --- --- --- 74 12 dul:Place fp fp 0.0 0.0 0.0
--- --- --- --- 116 19 dul:Organization fp fp 0.0 0.0 0.0
--- --- --- --- 227 6 dul:Role fp fp 0.0 0.0 0.0
--- --- --- --- 234 7 dul:Person fp fp 0.0 0.0 0.0

For the recognition of entities, there is no difference since we can simply sum up the tp, fp and fn counts. However, for the hierarchical F-measure, this is not possible. When evaluating the typing, we have to compare trees/hierarchies of types which can lead to more than one tp, fp or fn per comparison. Since we want to handle the single entities equal, GERBIL calculates the precision, recall and F1-measure for every entity (can be found in the table above). The averages of these values are the precision, recall and F1-measure scores for the complete document (for the example above, it is precision=7/16, recall=7/16 and F1-score=7/16).

@jplu @rtroncy I know it is not the most intuitive implementation :smiley:. It is arguable whether it is okay to have a "missed" entity not only counted as fn but as precision and recall = 0 and count the (nearly matching) fp entity again with precision and recall = 0. The only alternative that I can think of is a complicated weighting of the hierarchical tp, fp and fn counts to ensure that entities with a complex type hierarchy don't have a larger influence on the result compared to entities with an "easy" set of types.

jplu commented 6 years ago

Thanks @MichaelRoeder and @TortugaAttack. I can perfectly understand your concerns about the scoring I raised but it is more to be aligned with the known and popular neleval scorer.

Personally I think that the annotation:

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=33,39>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Paris)" ;
        nif:beginIndex        "33"^^xsd:nonNegativeInteger ;
        nif:endIndex          "39"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taClassRef     dul:Place .

Must be count as "false positive" AND "false negative" (if the system do not propose nested entities) because the offset do not match, and then the type even if it is good one should not be taken as true positive but also as "false positive" AND "false negative" in the recognition step. This is how works neleval, and I'm ok with that because it seems logic to me.

Please, can you let me know once the fix will be pushed to the public instance of GERBIL? I will rerun my script for scoring and then compare between GERBIL and neleval.

MichaelRoeder commented 6 years ago

Of cause, we will let you know. However, I think we still have a small misunderstanding.

Let's focus on the "Paris" / "Paris)" example. I totally agree that the recognition step has to count this as fp AND fn. I think there is no discussion regarding this point :wink: I want to underline, that the typing step is not able to see "Paris)" as attempt to match "Paris". It will handle them as two single entities and calculate for each of them precision, recall and F1-measure (for the reasons explained above). Therefore, it will count this 2 times with precision, recall, f1-score = 0 (not 1xfp and 1xfn) which leads to the overall evaluation scores of precision, recall, f1-score = 0.4375 which might be lower than expected.