freme-project / freme-ner

Apache License 2.0
6 stars 1 forks source link

nif:taMsClassRef #160

Closed sandroacoelho closed 7 years ago

sandroacoelho commented 7 years ago

@m1ci , @koidl have requested to fill the property nif:taMsClassRef with the most specific one based on the dbpedia ontology.

E.g:

<http://freme-project.eu/#offset_0_14>
        a                     nif:OffsetBasedString , nif:Phrase ;
        nif:anchorOf          "Diego Maradona"^^xsd:string ;
        nif:annotationUnit    [ a                       nif:EntityOccurrence ;
                                itsrdf:taAnnotatorsRef  <http://freme-project.eu/tools/freme-ner> ;
                                itsrdf:taClassRef       <http://nerd.eurecom.fr/ontology#Person> , <http://dbpedia.org/ontology/SoccerManager> , <http://dbpedia.org/ontology/Agent> , <http://dbpedia.org/ontology/SportsManager> , <http://dbpedia.org/ontology/Person> ;
                                nif:taMsClassRef <http://dbpedia.org/ontology/SoccerManager> ;
                                itsrdf:taConfidence     "0.9869992701528016"^^xsd:double ;
                                itsrdf:taIdentRef       <http://dbpedia.org/resource/Diego_Maradona>
                              ] ;
        nif:beginIndex        "0"^^xsd:nonNegativeInteger ;
        nif:endIndex          "14"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://freme-project.eu/#offset_0_33> .
sandroacoelho commented 7 years ago

Basically, it can be retrieved the following SPARQL Query

SELECT ?type WHERE { 
<http://dbpedia.org/resource/Diego_Maradona> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . 
FILTER NOT EXISTS { 
    ?subtype ^a <http://dbpedia.org/resource/Diego_Maradona> ;
             rdfs:subClassOf ?type .
  }
FILTER regex(str(?type), "dbpedia.org/ontology/")
}
sandroacoelho commented 7 years ago

Hi @m1ci : I ran the same query at http://rv2622.1blu.de:8890/sparql and did not get the same result as DBpedia SPARQL. I would bet that this problem is related to our indexed data. Just to ensure that I am in the right way, could you please check if my SPARQL is correct?

Thank you

m1ci commented 7 years ago

Hi @sandroacoelho the query at the FREME sparql endpoint did not work because the dbpedia ontology wasn't loaded and there were the required subClassOf statements. Now it shoud work. Try http://rv2622.1blu.de:8890/sparql

@sandroacoelho can you now implement so that we have the nif:taMsClassRef property in the NIF output? thanks!

sandroacoelho commented 7 years ago

Hi @m1ci. As I promised, nif:taMsClassRef is already implemented. Could you please test it?

Best

m1ci commented 7 years ago

I dont see the nif:taMsClassRef in the output, see https://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=all&outformat=turtle&informat=text&input=Diego%20Maradona%20is%20from%20Argentina.&nif-version=2.1

sandroacoelho commented 7 years ago

Hi @m1ci ,

Checking Jenkins, I saw that the main jar is not using SNAPSHOTS. The build is using our last stable version 0.11 that does not contain this new feature.

I forced it to give you a chance to test - (Note: This is wrong and should be reversed as soon as possible) .

At SPARQLProcessor.java, the address "http://www.freme-project.eu/datasets/types" is used as defaultGraph. Could you please load the DBpedia ontology inside it?

Best,

m1ci commented 7 years ago

At SPARQLProcessor.java, the address "http://www.freme-project.eu/datasets/types" is used as defaultGraph. Could you please load the DBpedia ontology inside it?

Can you try without specifying the default graph? Just QueryExecution qexec = QueryExecutionFactory.sparqlService(this.endpoint, query);

sandroacoelho commented 7 years ago

Hi @m1ci , Done!

m1ci commented 7 years ago

Thanks, however for https://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=all&outformat=turtle&informat=text&input=Diego%20Maradona%20is%20from%20Argentina.&nif-version=2.1 I get one type for Diego Maradna, which is fine, but two types for Argentina - problem. Why there are two types for Argentina?

m1ci commented 7 years ago

I found why, there is no subclass relation between http://dbpedia.org/ontology/Country and http://dbpedia.org/ontology/Location, also Location is sameAs Place. We need to fix the sparql query.

sandroacoelho commented 7 years ago

By definition, our query retrieves leafs (a type that does not have subtypes) for the most specific types in DBpedia ontology.

For Argentina, we have two "leafs" types and it could happen with others.

If we want just one resource to fill in nif:taMsClassRef, we could

1) Takes the first (I don't like this solution because is not deterministic);

2) Define filters to decide what type we should select to be a nif:taMsClassRef

Best,

m1ci commented 7 years ago

not, it's one leaf and its Country. Location is super class.

1) could happen, maybe LIMIT 1 is not good idea.

2) not sure about this.

Still we need to fix the query so it returns Country as most specific type

m1ci commented 7 years ago

Here is the solution

SELECT ?type WHERE { 
<http://dbpedia.org/resource/Argentina> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . 
FILTER NOT EXISTS { 
    <http://dbpedia.org/resource/Argentina> a ?subtype .
    ?subtype rdfs:subClassOf|owl:equivalentClass ?type .
  }
FILTER regex(str(?type), "dbpedia.org/ontology/")
}

@sandroacoelho please update the query

sandroacoelho commented 7 years ago

@m1ci : done!

m1ci commented 7 years ago

thanks @sandroacoelho

@koidl @xFran please test, for example: https://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=all&outformat=turtle&informat=text&input=Diego%20Maradona%20is%20from%20Argentina.&nif-version=2.1

jnehring commented 7 years ago

the query at the FREME sparql endpoint did not work because the dbpedia ontology wasn't loaded and there were the required subClassOf statements. Now it shoud work.

Do we need to update the dataset dumps for the docker installation because of that?

jnehring commented 7 years ago

Oh forget the last comment, I just read https://github.com/freme-project/freme-docker/issues/17#issuecomment-258323248 which says we need to update the dataset

m1ci commented 7 years ago

@sandroacoelho do we need to update the dataset? I think it is irrelevant in which graph are the datasets loaded, we query all data in all graphs. Correct me if Im wrong.

x-fran commented 7 years ago

@m1ci is working with turtle but no json-ld

https://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=all&outformat=json-ld&informat=text&input=Diego%20Maradona%20is%20from%20Argentina.

I can't see nif:taMsClassRef or something similar in the response.

sandroacoelho commented 7 years ago

Hi, @xFran: I will check our jsonld.

fsasaki commented 7 years ago

It works with NIF version 2.1, see http://tinyurl.com/jedem72

2016-11-14 13:32 GMT+01:00 Sandro notifications@github.com:

Hi, @xFran https://github.com/xFran: I will check our jsonld.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/freme-project/freme-ner/issues/160#issuecomment-260323656, or mute the thread https://github.com/notifications/unsubscribe-auth/ABH5AuVMMltjcYTvsNf__2HepaM4VYQ5ks5q-FT6gaJpZM4KmrZJ .

x-fran commented 7 years ago

Works with NIF version 2.1 indeed.

https://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=all&outformat=json-ld&informat=text&input=Diego%20Maradona%20is%20from%20Argentina.&nif-version=2.1

Thank you @fsasaki

jnehring commented 7 years ago

I have a few questions about this feature:

m1ci commented 7 years ago

Would it be complicated to add a configuration option to switch off the feature? I guess it has a huge impact on the performance of NER.

I would not complicate the things. And provide this info always. In other words, leave as it is.

What happens when the data for this feature is not contained in the triple store? Does it produce no MFS value? Or an error message?

If there is no data, then we dont provide the mfs value. If there is then we provide. Also, if there is just one type, then we list this type as most-specific-type and also as itsrdf:taClassRef.

Is there any documentation about the feature? We should add a (brief) documentation to https://freme-project.github.io/knowledge-base/freme-for-api-users/freme-ner.html

I'm afraid this is not documented. Please add section to the doc called "NIF Output explained" and explain each piece of information (1-2 sentences).

jnehring commented 7 years ago

| Would it be complicated to add a configuration option to switch off the feature? I guess it has a huge impact on the performance of NER.

| I would not complicate the things. And provide this info always. In other words, leave as it is.

Ok

thanks for the information

jnehring commented 7 years ago

I tested and this is installed on freme-live also. The documentation issue is moved to https://github.com/freme-project/freme-project.github.io/issues/320 . So this issue is done