freme-project / e-Internationalization

Apache License 2.0
0 stars 0 forks source link

[test] confidence information returned by e-Entity is not included in the HTML output #15

Closed m1ci closed 8 years ago

m1ci commented 8 years ago
curl -X POST --header "Content-Type: text/html" --header "Accept: text/html" -d "<p>Welcome to Berlin, the capital of Germany.</p>" "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia" -v

E-Entity also returns confidence scores for the entities, however this information is not included in the HTML/ITS output.

The result is:

<html><head><body><p>Welcome to <span its-ta-ident-ref="http://dbpedia.org/resource/Berlin" its-ta-class-ref="http://nerd.eurecom.fr/ontology#Location">Berlin</span>, the capital of <span its-ta-ident-ref="http://dbpedia.org/resource/Germany" its-ta-class-ref="http://nerd.eurecom.fr/ontology#Location">Germany</span>.</p></body></html>
fsasaki commented 8 years ago

Agree with the comment from @m1ci . Also, one may want to add information about the tool that produced the annotation. See http://www.w3.org/TR/its20/#its-tool-annotation and example 24 in that section.

borriellom commented 8 years ago

I committed the change. Please, check if it's working now.

m1ci commented 8 years ago

yes, well done! Posting the new result

<html>
   <head>
   <body>
      <p>Welcome to <span its-ta-ident-ref="http://dbpedia.org/resource/Berlin" its-ta-class-ref="http://nerd.eurecom.fr/ontology#Location" its-ta-confidence="0.7909971166830088">Berlin</span>, the capital of <span its-ta-ident-ref="http://dbpedia.org/resource/Germany" its-ta-class-ref="http://nerd.eurecom.fr/ontology#Location" its-ta-confidence="0.983524284693178">Germany</span>.</p>
   </body>
</html>
fsasaki commented 8 years ago

Sorry to be picky ... but would be possible to also add the tool information? That would then look like this <html its-annotators-ref="text-analysis|http://www.freme-project.eu/freme-ner"> ... (the rest stays as is)

jnehring commented 8 years ago

I think that we would need to add this information to the output of FREME NER before we can put it in the HTML. And when we add it to the NIF produced by FREME NER then we should also add it to the NIF produced by all other e-Services. We create a new issue for that?

borriellom commented 8 years ago

I agree with Jan. That information should be included in the NIF file produced by FREME NER. Then it can be converted to HTML.

fsasaki commented 8 years ago

ok - in which repro should this issue be created? the broker?

m1ci commented 8 years ago

1) IMO, this issue should go the e-Entity service, but also such issue should be opened for e-Translation and e-Terminology.

2) the "its-annotators-ref" should be specified at the annotation level (<span>), and not "sentence" level. Reason: single content, can be processed by multiple e-Services and the result annotations and this might cause problems in decoding the provenance.

3) in future we might include confidence scores not only for spotting, but also linking, classification, etc. This is still an open issue.

jnehring commented 8 years ago

I created a new issue https://github.com/freme-project/technical-discussion/issues/83 because of the its-annotators-ref property for all e-Services. I added @m1ci comments to the new issue.

This issue here started with missing confidence values and this is solved. So I close this issue.