freme-project / freme-ner

Apache License 2.0
6 stars 1 forks source link

input is overwritten #155

Closed m1ci closed 7 years ago

m1ci commented 7 years ago

@borriellom reported that FREME NER overwrites the input. E.g. if you use the docuemntation page https://freme-project.github.io/api-doc/full.html#!/e-Entity/executeFremeNer and submit

@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .

<http://freme-project.eu/#char=0,42>
        a               nif:String , nif:Context , nif:RFC5147String ;
        nif:beginIndex  "0"^^xsd:int ;
        nif:endIndex    "42"^^xsd:int ;
        nif:isString    "Welcome to Berlin, the capital of Germany!"^^xsd:string .

<http://freme-project.eu/#char=11,222>
        a                     nif:RFC5147String , nif:Phrase , nif:Word , nif:String ;
        nif:anchorOf          "Berlin"^^xsd:string ;
        nif:beginIndex        "11"^^xsd:int ;
        nif:endIndex          "17"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,42> .

the result should contain the http://freme-project.eu/#char=11,222 resource, the context resource, and two resources for the Berlin and the Germany entities.

However, the results is

@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .

<http://freme-project.eu/#char=0,42>
        a               nif:String , nif:Context , nif:RFC5147String ;
        nif:beginIndex  "0"^^xsd:int ;
        nif:endIndex    "42"^^xsd:int ;
        nif:isString    "Welcome to Berlin, the capital of Germany!"^^xsd:string .

<http://freme-project.eu/#char=34,41>
        a                     nif:RFC5147String , nif:Word , nif:Phrase , nif:String ;
        nif:anchorOf          "Germany"^^xsd:string ;
        nif:beginIndex        "34"^^xsd:int ;
        nif:endIndex          "41"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,42> ;
        itsrdf:taClassRef     <http://dbpedia.org/ontology/PopulatedPlace> , <http://dbpedia.org/ontology/Country> , <http://nerd.eurecom.fr/ontology#Location> , <http://dbpedia.org/ontology/Location> , <http://dbpedia.org/ontology/Place> ;
        itsrdf:taConfidence   "0.9205730934452003"^^xsd:double ;
        itsrdf:taIdentRef     <http://dbpedia.org/resource/Germany> .

<http://freme-project.eu/#char=11,17>
        a                     nif:RFC5147String , nif:Phrase , nif:Word , nif:String ;
        nif:anchorOf          "Berlin"^^xsd:string ;
        nif:beginIndex        "11"^^xsd:int ;
        nif:endIndex          "17"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,42> ;
        itsrdf:taClassRef     <http://dbpedia.org/ontology/Location> , <http://dbpedia.org/ontology/Region> , <http://dbpedia.org/ontology/PopulatedPlace> , <http://dbpedia.org/ontology/Place> , <http://dbpedia.org/ontology/AdministrativeRegion> , <http://nerd.eurecom.fr/ontology#Location> ;
        itsrdf:taConfidence   "0.789254744282841"^^xsd:double ;
        itsrdf:taIdentRef     <http://dbpedia.org/resource/Berlin> .

and the http://freme-project.eu/#char=11,222 is missing.

sandroacoelho commented 7 years ago

Hi @m1ci, @borriellom ,

It is happening because the current output describes the main string that is "Welcome to Berlin, the capital of Germany!", ignoring entities that were already annotated.

I will fix it appending these ignored content. So the output will be


@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .

<http://freme-project.eu/#char=0,42>
        a               nif:String , nif:Context , nif:RFC5147String ;
        nif:beginIndex  "0"^^xsd:int ;
        nif:endIndex    "42"^^xsd:int ;
        nif:isString    "Welcome to Berlin, the capital of Germany!"^^xsd:string .

<http://freme-project.eu/#char=34,41>
        a                     nif:RFC5147String , nif:Word , nif:Phrase , nif:String ;
        nif:anchorOf          "Germany"^^xsd:string ;
        nif:beginIndex        "34"^^xsd:int ;
        nif:endIndex          "41"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,42> ;
        itsrdf:taClassRef     <http://dbpedia.org/ontology/PopulatedPlace> , <http://dbpedia.org/ontology/Country> , <http://nerd.eurecom.fr/ontology#Location> , <http://dbpedia.org/ontology/Location> , <http://dbpedia.org/ontology/Place> ;
        itsrdf:taConfidence   "0.9205730934452003"^^xsd:double ;
        itsrdf:taIdentRef     <http://dbpedia.org/resource/Germany> .

<http://freme-project.eu/#char=11,17>
        a                     nif:RFC5147String , nif:Phrase , nif:Word , nif:String ;
        nif:anchorOf          "Berlin"^^xsd:string ;
        nif:beginIndex        "11"^^xsd:int ;
        nif:endIndex          "17"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,42> ;
        itsrdf:taClassRef     <http://dbpedia.org/ontology/Location> , <http://dbpedia.org/ontology/Region> , <http://dbpedia.org/ontology/PopulatedPlace> , <http://dbpedia.org/ontology/Place> , <http://dbpedia.org/ontology/AdministrativeRegion> , <http://nerd.eurecom.fr/ontology#Location> ;
        itsrdf:taConfidence   "0.789254744282841"^^xsd:double ;
        itsrdf:taIdentRef     <http://dbpedia.org/resource/Berlin> 

<http://freme-project.eu/#char=11,222>
        a                     nif:RFC5147String , nif:Phrase , nif:Word , nif:String ;
        nif:anchorOf          "Berlin"^^xsd:string ;
        nif:beginIndex        "11"^^xsd:int ;
        nif:endIndex          "17"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,42> .

By the way, http://freme-project.eu/#char=11,222 is not a valid URL for the current nif:referenceContext (Welcome to Berlin, the capital of Germany!")

m1ci commented 7 years ago

By the way, http://freme-project.eu/#char=11,222 is not a valid URL for the current nif:referenceContext (Welcome to Berlin, the capital of Germany!")

True, I just wanted to explain what is the correct behaviour. In general, any triple in the input should be also present in the output. That's all. There is method in Jena which can merge two models (Input and the results model) results.add(inputModel);

sandroacoelho commented 7 years ago

I sent an NIF Library (org.nlp2rdf.NIF) update to Sonatype. Once it updates in the main server, I will send a pull request that fix this issue. Usually, it takes 2/3 hours

jnehring commented 7 years ago

Is this fixed now?

sandroacoelho commented 7 years ago

Hi @jnehring. Yes.

@m1ci : Could you please test it?

Thanks

borriellom commented 7 years ago

It still doesn't work.

When using an input NIF file containing terms annotations from e-Terminology, all triples related to terms are deleted.

Input NIF

@prefix cc:    <http://creativecommons.org/ns#> .
@prefix :      <https://term.tilde.com/terms/> .
@prefix void:  <http://rdfs.org/ns/void#> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix its:   <http://www.w3.org/2005/11/its> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix skos:  <http://www.w3.org/2004/02/skos/core#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix tbx:   <http://tbx2rdf.lider-project.eu/tbx#> .
@prefix decomp: <http://www.w3.org/ns/lemon/decomp#> .
@prefix dct:   <http://purl.org/dc/terms/> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
@prefix ldr:   <http://purl.oclc.org/NET/ldr/ns#> .
@prefix odrl:  <http://www.w3.org/ns/odrl/2/> .
@prefix dcat:  <http://www.w3.org/ns/dcat#> .
@prefix prov:  <http://www.w3.org/ns/prov#> .

<http://freme-project.eu/#char=0,17>
        a               nif:String , nif:RFC5147String , nif:Context ;
        nif:beginIndex  "0"^^xsd:nonNegativeInteger ;
        nif:endIndex    "17"^^xsd:nonNegativeInteger ;
        nif:isString    "Welcome to Berlin" .

:Berlin-en  a                  ontolex:LexicalEntry ;
        ontolex:canonicalForm  <https://term.tilde.com/terms/Berlin-en#CanonicalForm> ;
        ontolex:language       <http://www.lexvo.org/page/iso639-3/eng> ;
        ontolex:sense          <https://term.tilde.com/terms/Berlin-en#Sense> .

<https://term.tilde.com/terms/Berlin-de#CanonicalForm>
        ontolex:writtenRep  "Berlin"@de .

:de     a                 ontolex:Lexicon ;
        ontolex:entry     <https://term.tilde.com/terms/das+Land+Berlin-de> , :Berlin-de ;
        ontolex:language  <http://www.lexvo.org/page/iso639-3/ger/deu> .

<https://term.tilde.com/terms/Berlin-en#Sense>
        ontolex:reference  :828860 , :618766 , :345292 .

<https://term.tilde.com/terms/das+Land+Berlin-de#Sense>
        ontolex:reference  :828860 .

:828860  a                skos:Concept ;
        rdfs:comment      "international trade"@en , "European organisations"@en , "Community law"@en ;
        tbx:subjectField  <https://term.tilde.com/domains/TaaS-0304> , <https://term.tilde.com/domains/TaaS-0107> , <https://term.tilde.com/domains/7611> , <https://term.tilde.com/domains/2021> , <https://term.tilde.com/domains/1011> .

:618766  a                skos:Concept ;
        rdfs:comment      "regions of EU Member States"@en , "information and information processing"@en ;
        tbx:subjectField  <https://term.tilde.com/domains/TaaS-2105> , <https://term.tilde.com/domains/TaaS-2000> , <https://term.tilde.com/domains/7211> , <https://term.tilde.com/domains/3231> .

:345292  a                skos:Concept ;
        rdfs:comment      "POLITICS"@en , "SCIENCE"@en , "LAW"@en ;
        tbx:subjectField  <https://term.tilde.com/domains/TaaS-0200> , <https://term.tilde.com/domains/TaaS-2100> , <https://term.tilde.com/domains/TaaS-0100> , <https://term.tilde.com/domains/12> , <https://term.tilde.com/domains/unknown> , <https://term.tilde.com/domains/36> .

<https://term.tilde.com/terms/Berlin-de#Sense>
        ontolex:reference  :618766 , :345292 .

<https://term.tilde.com/terms/das+Land+Berlin-de#CanonicalForm>
        ontolex:writtenRep  "das Land Berlin"@de .

<https://term.tilde.com/terms/das+Land+Berlin-de>
        a                      ontolex:LexicalEntry ;
        ontolex:canonicalForm  <https://term.tilde.com/terms/das+Land+Berlin-de#CanonicalForm> ;
        ontolex:language       <http://www.lexvo.org/page/iso639-3/ger/deu> ;
        ontolex:sense          <https://term.tilde.com/terms/das+Land+Berlin-de#Sense> .

:en     a                 ontolex:Lexicon ;
        ontolex:entry     :Berlin-en ;
        ontolex:language  <http://www.lexvo.org/page/iso639-3/eng> .

:Berlin-de  a                  ontolex:LexicalEntry ;
        ontolex:canonicalForm  <https://term.tilde.com/terms/Berlin-de#CanonicalForm> ;
        ontolex:language       <http://www.lexvo.org/page/iso639-3/ger/deu> ;
        ontolex:sense          <https://term.tilde.com/terms/Berlin-de#Sense> .

<http://freme-project.eu/#char=11,17>
        a                     nif:RFC5147String ;
        nif:anchorOf          "Berlin"@en ;
        nif:annotationUnit    [ rdfs:label           "Berlin"@en ;
                                itsrdf:taConfidence  1
                              ] ;
        nif:beginIndex        "11"^^xsd:nonNegativeInteger ;
        nif:endIndex          "17"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://freme-project.eu/#char=0,17> ;
        itsrdf:term           "yes" ;
        itsrdf:termInfoRef    :618766 , :828860 , <http://aims.fao.org/aos/agrovoc/c_8357> , :345292 .

:       a                 dcat:Dataset , tbx:MartifHeader ;
        <http://purl.org/dc/elements/1.1/source>
                "" ;
        dct:type          "TBX" ;
        tbx:encodingDesc  "<p type=\"XCSURI\">http://www.ttt.org/oscarstandards/tbx/TBXXCS.xcs</p>"^^rdf:XMLLiteral ;
        tbx:sourceDesc    "<sourceDesc><p/></sourceDesc>"^^rdf:XMLLiteral .

<https://term.tilde.com/terms/Berlin-en#CanonicalForm>
        ontolex:writtenRep  "Berlin"@en .

Response from FREME NER

@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .

<http://freme-project.eu/#char=11,17>
        a                     nif:RFC5147String , nif:Phrase , nif:Word , nif:String ;
        nif:anchorOf          "Berlin"^^xsd:string ;
        nif:beginIndex        "11"^^xsd:int ;
        nif:endIndex          "17"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,17> ;
        itsrdf:taClassRef     <http://dbpedia.org/ontology/Location> , <http://dbpedia.org/ontology/Region> , <http://dbpedia.org/ontology/PopulatedPlace> , <http://dbpedia.org/ontology/Place> , <http://dbpedia.org/ontology/AdministrativeRegion> , <http://nerd.eurecom.fr/ontology#Location> ;
        itsrdf:taConfidence   "0.7224316213090576"^^xsd:double ;
        itsrdf:taIdentRef     <http://dbpedia.org/resource/Berlin> .

<http://freme-project.eu/#char=0,7>
        a                     nif:Phrase , nif:Word , nif:String , nif:RFC5147String ;
        nif:anchorOf          "Welcome"^^xsd:string ;
        nif:beginIndex        "0"^^xsd:int ;
        nif:endIndex          "7"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,17> ;
        itsrdf:taClassRef     <http://dbpedia.org/ontology/Film> , <http://dbpedia.org/ontology/Work> , <http://dbpedia.org/ontology/Wikidata:Q11424> , <http://www.w3.org/2002/07/owl#Thing> ;
        itsrdf:taConfidence   "0.33602748377957836"^^xsd:double ;
        itsrdf:taIdentRef     <http://dbpedia.org/resource/Welcome_(2007_film)> .

<http://freme-project.eu/#char=0,17>
        a               nif:String , nif:Context , nif:RFC5147String ;
        nif:beginIndex  "0"^^xsd:int ;
        nif:endIndex    "17"^^xsd:int ;
        nif:isString    "Welcome to Berlin"^^xsd:string .
sandroacoelho commented 7 years ago

Hi @borriellom

Could you please test again?

Thanks

jnehring commented 7 years ago

I think this issue is fixed. @borriellom can you please confirm?

jnehring commented 7 years ago

There is still an issue, I guess this issue is the cause:

This CURL

curl -X POST -H "Cache-Control: no-cache" -H "Postman-Token: 6cdffc4c-ab46-f413-545e-5c3741b743c1" -d 'Welcome to Berlin!' "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&informat=text"

produces

@prefix xsd:   
<http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: 
    <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif:   
        <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .

            <http://freme-project.eu/#char=0,18>
        a                     nif:Context , nif:Word , nif:Phrase , nif:String , nif:RFC5147String ;
        nif:anchorOf          "Welcome to Berlin !"^^xsd:string ;
        nif:beginIndex        "0"^^xsd:nonNegativeInteger , "0"^^xsd:int ;
        nif:endIndex          "18"^^xsd:nonNegativeInteger , "18"^^xsd:int ;
        nif:isString          "Welcome to Berlin!" , "Welcome to Berlin!"^^xsd:string ;
        nif:referenceContext  
                <http://freme-project.eu/#char=0,18> ;
        itsrdf:taClassRef     
                    <http://www.w3.org/2002/07/owl#Thing> ;
        itsrdf:taConfidence   "0.8882260450843884"^^xsd:double .

Note that the nif:isString property has the same value twice.

sandroacoelho commented 7 years ago

Hi @jnehring ,

You are right. Jena merges all statements but if there is something distinct, like the property ^^xsd:string in "Welcome to Berlin!", the output will be duplicated.

I am fixing the NIF Lib right now.

sandroacoelho commented 7 years ago

Hi @m1ci :

Do you know if every nif:isString in NIF20 has ^^xsd:string property?

m1ci commented 7 years ago

there is no range for nif:isString, so the datatype (xsd:string) is optional. It can be, but it is not necessary to be present.

sandroacoelho commented 7 years ago

In NIF-Lib we always fill data types. We need to keep one to do not have duplicate values. Which one should we remove: from the original file or our annotation process?

m1ci commented 7 years ago

In NIF-Lib we always fill data types. We need to keep one to do not have duplicate values. Which one should we remove: from the original file or our annotation process?

Remove the isstring triple from the original doc, and keep the one with datatype in ours.

Milan Dojchinovski http://dojchinovski.mk

sandroacoelho commented 7 years ago

Hi @jnehring , @m1ci , @borriellom

Could you please test again?

Thanks

m1ci commented 7 years ago

works now

jnehring commented 7 years ago

ok thanks for the fix. please note that it is not part of the current release