dkt-projekt / technical-discussion

This repository is used for technical discussions.
0 stars 0 forks source link

Change from 'link' to 'mode' at /e-nlp/namedEntityRecognition #10

Open jox opened 8 years ago

jox commented 8 years ago

The 'link' parameter was dropped and a 'mode' parameter was introduced instead in the namedEntityRecognition endpoint.

I'm experiencing some strange behaviour with the 'mode' parameter. I'm feeding the following parameters:

Input: "I live in Berlin."
Analysis: ner
Models: ner-wikinerEn_LOC
Language: en
Informat: text
Outformat: turtle

Now depending on the mode, I get the following results:

I did a quick look, but the logic in EOpenNLPService seems ok. With mode 'spot', NameFinder.spotEntitiesNIF is invoked and with mode 'link', NameFinder.linkEntitiesNIF is invoked. With mode 'all', both of them.

Not sure if that's some local issue here or a bug. Can anybody reproduce that?

PS. I found some other code issue, that I will address with a pull request, but that's not the cause of the problem.

PeterBourgonje commented 8 years ago

Great that you're trying out this new feature already, thanks for the feedback! Since this is only in the dev version now, it's not yet documented. Hope the following explains it a bit more and also solves your issues. The general idea of course is to separate the spotting of entities from the linking of entities to increase modularity.

Using mode=all, as you mention already, we get links for entities (providing that they exist in the dbpedia ontology and dbpedia didn't time out), and coordinates for locations (again; if available). So using the following params: http://localhost:8092/e-nlp/namedEntityRecognition?analysis=ner&informat=text&models=ner-wikinerEn_LOC;ner-wikinerEn_PER;ner-wikinerEn_ORG&language=en&mode=all and putting "I live in Berlin." in the body, the result is:

@prefix dktnif: http://dkt.dfki.de/ontologies/nif# . @prefix geo: http://www.w3.org/2003/01/geo/wgs84_pos/ . @prefix dbo: http://dbpedia.org/ontology/ . @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix xsd: http://www.w3.org/2001/XMLSchema# . @prefix itsrdf: http://www.w3.org/2005/11/its/rdf# . @prefix nif: http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core# . @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .

http://dkt.dfki.de/documents/#char=0,17 a nif:String , nif:Context , nif:RFC5147String ; dktnif:averageLatitude "52.516666666666666"^^xsd:double ; dktnif:averageLongitude "13.383333333333333"^^xsd:double ; dktnif:standardDeviationLatitude "0.0"^^xsd:double ; dktnif:standardDeviationLongitude "0.0"^^xsd:double ; nif:beginIndex "0"^^xsd:nonNegativeInteger ; nif:endIndex "17"^^xsd:nonNegativeInteger ; nif:isString "I live in Berlin."^^xsd:string .

http://dkt.dfki.de/documents/#char=10,16 a nif:String , nif:RFC5147String ; nif:anchorOf "Berlin"^^xsd:string ; nif:beginIndex "10"^^xsd:nonNegativeInteger ; nif:endIndex "16"^^xsd:nonNegativeInteger ; nif:referenceContext http://dkt.dfki.de/documents/#char=0,17 ; geo:lat "52.516666666666666"^^xsd:double ; geo:long "13.383333333333333"^^xsd:double ; itsrdf:taClassRef dbo:Location ; itsrdf:taIdentRef http://dbpedia.org/resource/Berlin .

Using mode=spot, e.g. the following params: http://localhost:8092/e-nlp/namedEntityRecognition?analysis=ner&informat=text&models=ner-wikinerEn_LOC;ner-wikinerEn_PER;ner-wikinerEn_ORG&language=en&mode=spot and "I live in Berlin." in the body, the result is:

@prefix dktnif: http://dkt.dfki.de/ontologies/nif# . @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix xsd: http://www.w3.org/2001/XMLSchema# . @prefix itsrdf: http://www.w3.org/2005/11/its/rdf# . @prefix nif: http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core# . @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .

http://dkt.dfki.de/documents/#char=0,17 a nif:RFC5147String , nif:String , nif:Context ; nif:beginIndex "0"^^xsd:nonNegativeInteger ; nif:endIndex "17"^^xsd:nonNegativeInteger ; nif:isString "I live in Berlin."^^xsd:string .

http://dkt.dfki.de/documents/#char=10,16 a nif:RFC5147String , nif:String ; nif:anchorOf "Berlin"^^xsd:string ; nif:beginIndex "10"^^xsd:nonNegativeInteger ; nif:endIndex "16"^^xsd:nonNegativeInteger ; nif:referenceContext http://dkt.dfki.de/documents/#char=0,17 ; itsrdf:taClassRef http://dbpedia.org/ontology/Location .

Using mode=link doesn't make a lot of sense if you are using plain text as input, as this mode assumes that you provide NIF as input, and if this NIF has any entities in it, it proceeds to try and get a dbpedia link (if there are no entities, it will do nothing and just return the same NIF). I had written a catch for this already (the line if (rMode.contains("link") && (!rMode.contains("spot") && informat == "text")) { in EOpenNLPServiceStandAlone), but due to incorrect placement of parenthesis the scope of my negation was incorrect. So thanks for signaling this, as it allowed me to fix this bug. Now the behavior is as intended. E.g. with the following: http://localhost:8092/e-nlp/namedEntityRecognition?analysis=ner&informat=text&models=ner-wikinerEn_LOC;ner-wikinerEn_PER;ner-wikinerEn_ORG&language=en&mode=link and with "I live in Berlin." in the body, the result is:

{ "exception": "eu.freme.common.exception.BadRequestException", "path": "/e-nlp/namedEntityRecognition", "message": "Unsupported mode combination: Either provide NIF input or use link in combination with spot.", "error": "Bad Request", "status": 400, "timestamp": 1463127386028 }

If however you use NIF as input, like so: http://localhost:8092/e-nlp/namedEntityRecognition?analysis=ner&informat=turtle&models=ner-wikinerEn_LOC;ner-wikinerEn_PER;ner-wikinerEn_ORG&language=en&mode=link and put the result of the mode=spot request from above in the body, you get the same again as using mode=all (see the result above). I will commit the fix in a minute and if, after the commit, you cannot reproduce these results on your system, please let me know!

jox commented 8 years ago

Peter, thanks for your explanation. I think I wasn't quite sure about the meaning of the mode "spot". So it does spot entities in a plaintext. I guess I was expecting to spot coordinates from a location. And I wasn't aware that the linking also adds the coordinates. So, sorry for my lack of knowledge. But glad it helped to spot another issue :-)