freme-project / freme-ner

Apache License 2.0
6 stars 1 forks source link

Wrong <span> tag attributes when using HTML as in- and output #137

Closed ghsnd closed 8 years ago

ghsnd commented 8 years ago

Hi, This is a specific case. Issuing this command:

curl -X POST --header 'Content-Type: text/html' --header 'Accept: text/html' -d '<!DOCTYPE HTML>
<html>
    <body>
        <figure><img src="media/File_Leipzig_Fockeberg_Zentrum.jpg" alt="Leipzig Fockeberg Zentrum"></figure>
        <main>
        <p>Leipzig later played a significant role in instigating the fall of communism in Eastern Europe, through events which took place in and around St. Nicholas Church. Since the reunification of Germany, Leipzig has undergone significant change with the restoration of some historical buildings, the demolition of others, and the development of a modern transport infrastructure. Leipzig today is an economic center and the most livable city in Germany, according to the GfK marketing research institution. Oper Leipzig is one of the most prominent opera houses in Germany, and Leipzig Zoological Garden is one of the most modern zoos in Europe and ranks first in Germany and second in Europe according to Anthony Sheridan. Leipzig is currently listed as Gamma World City and Germany\u0027s "Boomtown". Outside of Leipzig the Neuseenland district forms a huge lake area by approx 116 square miles (300 square kilometres).</p>
        </main>
    </body>
</html>
' 'http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=spot%2Clink'

produces:

<!doctype html>
<html>
 <head></head> 
 <body> 
  <figure> 
   <img alt="Leipzig Fockeberg Zentrum" src="media/File_Leipzig_Fockeberg_Zentrum.jpg"> 
  </figure> 
  <main> 
   <p><span its-ta-confidence="0.5372637600615782" its-ta-ident-ref="http://dbpedia.org/resource/Leipzig">Leipzig</span> later played a significant role in instigating the fall of communism in <span its-ta-confidence="0.7598841653151605" its-ta-ident-ref="http://dbpedia.org/resource/Europe" 0.9318870576403856">Europe</span>"&gt;Eastern Europe, through events which took place in and around <span its-ta-confidence="0.9793861476410285" its-ta-ident-ref="http://dbpedia.org/resource/St._Nicholas%252C_Vale_of_Glamorgan">St. Nicholas Church</span>. Since the reunification of <span its-ta-confidence="0.91815406124852" its-ta-ident-ref="http://dbpedia.org/resource/Germany">Germany</span>, <span its-ta-confidence="0.9286518798806868" its-ta-ident-ref="http://dbpedia.org/resource/Leipzig">Leipzig</span> has undergone significant change with the restoration of some historical buildings, the demolition of others, and the development of a modern transport infrastructure. <span its-ta-confidence="0.9475422944046541" its-ta-ident-ref="http://dbpedia.org/resource/Leipzig">Leipzig</span> today is an economic center and the most livable city in <span its-ta-confidence="0.9893666422060133" its-ta-ident-ref="http://dbpedia.org/resource/Germany">Germany</span>, according to the <span its-ta-confidence="0.8123442734231748" its-ta-ident-ref="http://dbpedia.org/resource/GfK">GfK</span> marketing research institution. <span its-ta-confidence="0.7597714697099338" its-ta-ident-ref="http://dbpedia.org/resource/Leipzig_Opera">Oper Leipzig</span> is one of the most prominent opera houses in <span its-ta-confidence="0.9166502283624632" its-ta-ident-ref="http://dbpedia.org/resource/Germany">Germany</span>, and <span its-ta-confidence="0.9285446982965856" its-ta-ident-ref="http://dbpedia.org/resource/Leipzig_Zoological_Garden">Leipzig Zoological Garden</span> is one of the most modern zoos in Europe and ranks first in <span its-ta-confidence="0.8592893957892673" its-ta-ident-ref="http://dbpedia.org/resource/Germany">Germany</span> and second in Europe according to <span its-ta-confidence="0.9992979187461637" its-ta-ident-ref="http://dbpedia.org/resource/Tony_Sheridan">Anthony Sheridan</span>. Leipzig is currently listed as <span its-ta-confidence="0.5005679494704696" its-ta-ident-ref="http://dbpedia.org/resource/Global_city">Gamma World City</span> and <span its-ta-confidence="0.8483773635994547" its-ta-ident-ref="http://dbpedia.org/resource/Germany">Germany</span>'s <span its-ta-confidence="0.5726971568947093" its-ta-ident-ref="http://dbpedia.org/resource/Boomtown">Boomtown</span>. Outside of Leipzig the <span its-ta-confidence="0.9930554582006074" its-ta-ident-ref="http://dbpedia.org/resource/Neuseenland">Neuseenland</span> district forms a huge lake area by approx 116 square miles (300 square kilometres).</p> 
  </main>  
 </body>
</html>

Of interest is the sentence ...instigating the fall of communism in Eastern Europe, ...*

which becomes:

... instigating the fall of communism in <span its-ta-confidence="0.7598841653151605" its-ta-ident-ref="http://dbpedia.org/resource/Europe" 0.9318870576403856">Europe</span>"&gt;Eastern Europe, ...

which is not correct.

m1ci commented 8 years ago

I think the issue is related to the e-Internationalization service. Processing the sentence

Leipzig later played a significant role in instigating the fall of communism in Eastern Europe, through events which took place in and around St. Nicholas Church.

with FREME NER, returns Eastern Europe correctly spotted and linked (http://dbpedia.org/resource/Eastern_Europe). adding @jnehring and @katia-vistatec into the loop.

katia-vistatec commented 8 years ago

Hi. Using broker Local, which service can I use to debug it locally?

jnehring commented 8 years ago

You can use broker-dev. The example uses FREME NER and I assume the DBPedia dataset. broker-dev is configured to proxy the request to freme-ner so you do not need to install the heavy-weight freme-ner on your local machine.

ghsnd commented 8 years ago

Note, to reproduce the bug, you have to use all sentences (exactly as in the given command), not just the one where it goes wrong. If you use only that sentence, it goes well.

katia-vistatec commented 8 years ago

Ok. I have found a way to debug it and I can reproduce the issue.

katia-vistatec commented 8 years ago

@ghsnd you can check if it is ok now.

ghsnd commented 8 years ago

All <span> tags in my tutorial contents are OK now. Thanks!