freme-project / freme-ner

Apache License 2.0
6 stars 1 forks source link

Missing annotations on NIF service response #170

Closed sandroacoelho closed 7 years ago

sandroacoelho commented 7 years ago

@Gustavo Publio says:

Hi all,

As you may already know, we are using the FREME NIF service API to annotate some Dutch texts.

But we found out that the service is missing some annotations.

For instance:

We are using the dataset sbr2 that I've already uploaded into the service This dataset contains, among other terms, several references to the term "ankerloze spouwmuur":

gpublio (master) code $ cat sbrdataset.nt | grep -i "ankerloze spouwmuur"
<http://data.sbrcurnet/Referentiedetail/402.4.0.02> <http://www.w3.org/2004/02/skos/core#altLabel> "dakelementen op de ankerloze spouwmuur"@nl .
<http://data.sbrcurnet/Referentiedetail/402.4.0.02> <http://www.w3.org/2004/02/skos/core#prefLabel> "ankerloze spouwmuur, dakelementen op de ankerloze spouwmuur"@nl .
<http://data.sbrcurnet/Referentiedetail/406.4.0.01> <http://www.w3.org/2004/02/skos/core#prefLabel> "houten vloer, sporenkap, ankerloze spouwmuur, zakgoot op vloer"@nl .
<http://data.sbrcurnet/Referentiedetail/216.4.1.02> <http://www.w3.org/2004/02/skos/core#prefLabel> "ankerloze spouwmuur, gesloten wandelementen"@nl .
<http://data.sbrcurnet/Referentiedetail/204.4.1.02> <http://www.w3.org/2004/02/skos/core#prefLabel> "ankerloze spouwmuur, gesloten wandelementen"@nl .
<http://data.sbrcurnet/Referentiedetail/216.4.2.01> <http://www.w3.org/2004/02/skos/core#prefLabel> "ankerloze spouwmuur, halfopen wandelementen"@nl .
<http://data.sbrcurnet/Referentiedetail/204.4.2.01> <http://www.w3.org/2004/02/skos/core#prefLabel> "ankerloze spouwmuur"@nl .
<http://data.sbrcurnet/Referentiedetail/402.4.0.01> <http://www.w3.org/2004/02/skos/core#altLabel> "dakelementen naast de ankerloze spouwmuur"@nl .
<http://data.sbrcurnet/Referentiedetail/402.4.0.01> <http://www.w3.org/2004/02/skos/core#prefLabel> "ankerloze spuwmuur, dakelementen naast de ankerloze spouwmuur"@nl .
<http://data.sbrcurnet/Referentiedetail/104.2.0.04.G3> <http://www.w3.org/2004/02/skos/core#prefLabel> "ribcassettevloer, ankerloze spouwmuur, verend opgelegde dekvloer"@nl .
<http://data.sbrcurnet/Referentiedetail/216.4.1.01> <http://www.w3.org/2004/02/skos/core#prefLabel> "ankerloze spouwmuur, halfopen wandelementen"@nl .
<http://data.sbrcurnet/Referentiedetail/204.4.1.01> <http://www.w3.org/2004/02/skos/core#prefLabel> "ankerloze spouwmuur"@nl .
<http://data.sbrcurnet/Referentiedetail/418.4.0.01> <http://www.w3.org/2004/02/skos/core#altLabel> "dakelementen op de ankerloze spouwmuur"@nl .
<http://data.sbrcurnet/Referentiedetail/418.4.0.01> <http://www.w3.org/2004/02/skos/core#prefLabel> "ankerloze spouwmuur, dakelementen op de ankerloze spouwmuur"@nl .
<http://data.sbrcurnet/Referentiedetail/402.1.0.01.G1> <http://www.w3.org/2004/02/skos/core#prefLabel> "ankerloze spouwmuur"@nl .
<http://data.sbrcurnet/Referentiedetail/367.4.0.01> <http://www.w3.org/2004/02/skos/core#prefLabel> "houten vloer, ankerloze spouwmuur"@nl .
<http://data.sbrcurnet/Referentiedetail/104.2.0.04.G1> <http://www.w3.org/2004/02/skos/core#prefLabel> "ribcassettevloer, ankerloze spouwmuur"@nl .
<http://data.sbrcurnet/Referentiedetail/104.4.0.01> <http://www.w3.org/2004/02/skos/core#prefLabel> "ribcassettevloer, ankerloze spouwmuur, met stelregel"@nl .
<http://data.sbrcurnet/Referentiedetail/204.4.2.02.PH> <http://www.w3.org/2004/02/skos/core#prefLabel> "passiefhuis, ankerloze spouwmuur, HSB element met I-ligger en leidingspouw"@nl .
<http://data.sbrcurnet/Referentiedetail/204.1.1.01.G1> <http://www.w3.org/2004/02/skos/core#prefLabel> "ankerloze spouwmuur"@nl .
<http://data.sbrcurnet/Referentiedetail/207.4.2.01> <http://www.w3.org/2004/02/skos/core#prefLabel> "houten kozijn, ankerloze spouwmuur"@nl .
<http://data.sbrcurnet/Referentiedetail/405.4.0.01> <http://www.w3.org/2004/02/skos/core#altLabel> "dakelementen naast de ankerloze spouwmuur"@nl .
<http://data.sbrcurnet/Referentiedetail/405.4.0.01> <http://www.w3.org/2004/02/skos/core#prefLabel> "dakelementen naast de ankerloze spouwmuur, hout of plaatmateriaal als gevelbekleding"@nl .
<http://data.sbrcurnet/Referentiedetail/304.4.0.01> <http://www.w3.org/2004/02/skos/core#prefLabel> "houten vloer, ankerloze spouwmuur"@nl .

Even so, if we submit a text with this term, it is simple ignored. For instance, here is a sample curl call:

gpublio (master *) NIF $ curl -X POST --header 'Content-Type: text/plain' --header 'Accept: application/ld+json' -d "<vet>Brandveiligheid</vet>\n                    <linebreak></linebreak>Voor de details waarbij een ankerloze spouwmuur aansluit op een plat dak is\n                    ter voorkoming van brandoverslag / branddoorslag een 15 mm vezelversterkte gipskartonplaat als\n                    plafond aangegeven." 'https://api.freme-project.eu/current/e-entity/freme-ner/documents?prefix='http://data.sbrcurnet.nl/Infobladen/008/nif'&language=nl&dataset=sbr2&mode=all&nif-version=2.1'
{
  "@graph" : [ {
    "@id" : "http://data.sbrcurnet.nl/Infobladen/008/nif#offset_0_303",
    "@type" : [ "nif:Context", "nif:OffsetBasedString" ],
    "beginIndex" : "0",
    "endIndex" : "303",
    "nif:isString" : "<vet>Brandveiligheid</vet>\\n                    <linebreak></linebreak>Voor de details waarbij een ankerloze spouwmuur aansluit op een plat dak is\\n                    ter voorkoming van brandoverslag / branddoorslag een 15 mm vezelversterkte gipskartonplaat als\\n                    plafond aangegeven."
  }, {
    "@id" : "http://data.sbrcurnet.nl/Infobladen/008/nif/#collection",
    "@type" : "nif:ContextCollection",
    "hasContext" : "http://data.sbrcurnet.nl/Infobladen/008/nif/#offset_0_303",
    "conformsTo" : "http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/2.1"
  }, {
    "@id" : "http://data.sbrcurnet.nl/Infobladen/008/nif/#offset_0_303",
    "@type" : [ "nif:Context", "nif:OffsetBasedString" ],
    "beginIndex" : "0",
    "endIndex" : "303",
    "nif:isString" : "<vet>Brandveiligheid</vet>\\n                    <linebreak></linebreak>Voor de details waarbij een ankerloze spouwmuur aansluit op een plat dak is\\n                    ter voorkoming van brandoverslag / branddoorslag een 15 mm vezelversterkte gipskartonplaat als\\n                    plafond aangegeven."
  } ],
  "@context" : {
    "conformsTo" : {
      "@id" : "http://purl.org/dc/terms/conformsTo",
      "@type" : "@id"
    },
    "hasContext" : {
      "@id" : "http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#hasContext",
      "@type" : "@id"
    },
    "isString" : {
      "@id" : "http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#isString",
      "@type" : "http://www.w3.org/2001/XMLSchema#string"
    },
    "endIndex" : {
      "@id" : "http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#endIndex",
      "@type" : "http://www.w3.org/2001/XMLSchema#nonNegativeInteger"
    },
    "beginIndex" : {
      "@id" : "http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#beginIndex",
      "@type" : "http://www.w3.org/2001/XMLSchema#nonNegativeInteger"
    },
    "xsd" : "http://www.w3.org/2001/XMLSchema#",
    "itsrdf" : "http://www.w3.org/2005/11/its/rdf#",
    "nif" : "http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#"
  }
}

As we are relying in the data provided by the NIF service, could you please check what is going on? Whether it could be something related to the language, or maybe the way we are doing it.

fsasaki commented 7 years ago

Hi all, this may be related to the moving of the FREME framework installations to the Adapt centre. I assume that the moving does not include data sets that are currently being uploaded. @jnehring can maybe say more (he will be back next week, I assume).

sandroacoelho commented 7 years ago

Hi @fsasaki , Thank you. I will check with him

jnehring commented 7 years ago

NER consists of spot, classify and link. The dataset you provided only adds the term "ankerloze spouwmuur" to the linking phase. Maybe the term is not recognized by entity spotting? Or it is wrongly classified?

Also you specified input format = plaintext. But the text you submitted does not look like plaintext. You can experiment with different input formats, maybe XML is a better choice?

If you like we can discuss this issue on wednesdays developers call.

sandroacoelho commented 7 years ago

Hi @jnehring, I have answered @Gustavo with the same explanation on sunday. Thanks for your answer.

We can close it