freme-project / pipelines

Apache License 2.0
0 stars 0 forks source link

NER for ORCID not working #21

Closed stoitsis closed 8 years ago

stoitsis commented 8 years ago

I am testing the FREME NER services for orcid but I cannot get a result that is connected to orcid dataset.

For Giannis Stoitsis http://api.freme-project.eu/0.3/e-entity/freme-ner/documents?input=Giannis%20Stoitsis&informat=text&outformat=json-ld&language=en&dataset=orcid

For John Stoitsis http://api.freme-project.eu/0.3/e-entity/freme-ner/documents?input=Stoitsis&informat=text&outformat=json-ld&language=en&dataset=orcid

Am I using a wrong version of the service?

m1ci commented 8 years ago

Am I using a wrong version of the service?

No, you are using it well. The problem is that no entity was spotted in your text documents "Giannis Stoitsis" and "John Stoitsis". If you submit a bit longer text, then your mention will be recognized. Try for example with "Giannis Stoitsis is a person." http://api.freme-project.eu/0.3/e-entity/freme-ner/documents?input=Giannis%20Stoitsis%20is%20a%20person.&informat=text&outformat=turtle&language=en&dataset=orcid

Your mention will be recognized and linked to your ORCID record.

stoitsis commented 8 years ago

Yes, you are right. I forgot the issue of small text. Many thanks.

stoitsis commented 8 years ago

One more comment. I see that the NER identifies Giannis Stoitsis but not John Stoitsis that is also part of the alternative label in ORCID see here http://orcid.org/0000-0003-3347-8265. Is this missing from the ORCID dataset that you have used?

m1ci commented 8 years ago

Is this missing from the ORCID dataset that you have used?

We use the latest dump from 2014: http://support.orcid.org/knowledgebase/articles/223698-how-do-i-get-a-public-data-file- So, either this info (John Stoitsis) is not present in the dump, or we just missed that in the conversion process.

The dataset is very large, but will try to look into it and check whether the the alternative name "John Stoitsis" is present there. BTW, didn't you add this to your ORCID profile recently? If yes, then this might be the reason.

stoitsis commented 8 years ago

It was added at the beginning if I remember well. I will try to find also other examples with alternative names.

m1ci commented 8 years ago

OK, let me track on our side why this alternative label is not considered.

stoitsis commented 8 years ago

Hi Milan, I just remember that I enriched my profile at some point. No need to check. I think this is the reason. Apologies for the false alarm.

m1ci commented 8 years ago

OK, no problem, please check with other "alternative labels" if they match, then this is solved.

ghsnd commented 8 years ago

@stoitsis , @m1ci : can this issue be closed?

stoitsis commented 8 years ago

Hi,

from my side yes. One problem that still remains is NER with OCRID not working for very short texts. I think that you will discuss this with Giorgos.

Many thanks,

Giannis.

On Tue, Nov 10, 2015 at 11:36 AM, Gerald H notifications@github.com wrote:

@stoitsis https://github.com/stoitsis , @m1ci https://github.com/m1ci : can this issue be closed?

— Reply to this email directly or view it on GitHub https://github.com/freme-project/pipelines/issues/21#issuecomment-155369597 .

m1ci commented 8 years ago

This works now, try:

http://api.freme-project.eu/0.5/e-entity/freme-ner/documents?input=Giannis%20Stoitsis&informat=text&outformat=turtle&language=en&dataset=orcid&mode=link

Note the mode=link - this means that the input text will be considered as single entity, and e-entity will try to link this entity to the specified dataset. In other words, spotting is not performed.

If you set other value for the mode parameter, this means that first will be applied entity spotting, then entity linking, etc. Let me know if it is clear.

jnehring commented 8 years ago

The call that Milan said it works now:

http://api.freme-project.eu/0.6/e-entity/freme-ner/documents?input=Giannis%20Stoitsis&informat=text&outformat=turtle&language=en&dataset=orcid&mode=link

produces

@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .

<http://freme-project.eu/#char=0,16>
        a                     nif:RFC5147String , nif:Context , nif:Word , nif:Phrase , nif:String ;
        nif:anchorOf          "Giannis Stoitsis"^^xsd:string ;
        nif:beginIndex        "0"^^xsd:nonNegativeInteger , "0"^^xsd:int ;
        nif:endIndex          "16"^^xsd:int , "16"^^xsd:nonNegativeInteger ;
        nif:isString          "Giannis Stoitsis"^^xsd:string , "Giannis Stoitsis" ;
        nif:referenceContext  <http://freme-project.eu/#char=0,16> ;
        itsrdf:taConfidence   "0.9768781898645936"^^xsd:double .

There is no link.

m1ci commented 8 years ago

@sandroacoelho can you please check why is this not working?

sandroacoelho commented 8 years ago

sure

sandroacoelho commented 8 years ago

Hi all,

I created a test case and have not found the problems anymore. So, I am closing this issue.

If some of bugs persist, I will reopen it and add more tests .

Thanks,