freme-project / e-Entity

Apache License 2.0
1 stars 1 forks source link

Generating HTML: entities inside attributes #61

Closed ghsnd closed 8 years ago

ghsnd commented 8 years ago

I send this HTML file to e-Entity, and ask to give HTML back. The curl command would look like:

curl -vX POST --header "Content-Type: text/html" --header "Accept: text/html" -d '@administration.txt' "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=all"

(The .txt extension is only to let GitHub store the file).

The entities are inserted; but the generated HTML is not always valid: nested quotes occur. This is because entities inside attributes are recognised. Here is an example where it goes wrong in a tag:

Input (attributes are put on separate lines for clarity):

<link 
    rel="alternate" 
    type="application/x-wiki" 
    title="Edit this page" 
    href="https://en.wikipedia.org/w/index.php?title=Athens&amp;action=edit"
>

Output (mind the title attribute where it links the word Edit):

<link
    rel="alternate"
    type="application/x-wiki"
    title="<span its-ta-ident-ref="http://dbpedia.org/resource/Edit" its-ta-class-ref="http://www.w3.org/2002/07/owl#Thing" its-ta-confidence="0.8372968819134853">Edit</span> this page" 
    href="https://en.wikipedia.org/w/index.php?title=Athens&amp;action=edit"
>
m1ci commented 8 years ago

Processing html is task for the e-Internatinalizations service. So question for marta or jan.

jnehring commented 8 years ago

@borriellom can you fix this problem?

borriellom commented 8 years ago

I opened an issue about this problem freme-project/e-Internationalization#11 I'm afraid we should wait for Okapi development team doing that change mentioned by Yves. At the moment, when I receive a text unit from Okapi, I cannot discern whether it is actual text or an attribute value.

jnehring commented 8 years ago

I think this is a dublicate of freme-project/e-Internationalization#11 so I close this issue.