freme-project / basic-services

Apache License 2.0
0 stars 1 forks source link

[e-Internationalization] add itsrdf:taAnnotatorRef to output html #111

Closed jnehring closed 8 years ago

jnehring commented 8 years ago

When e-Internationalization does roundtripping and during creation of the output HTML an annotation has the property itsrdf:taAnnotatorsRef then this should be added to the output HTML. itsrdf:taAnnotatorsRef is currently produced by FREME NER, Tilde e-Translation and Tilde e-Terminology when nif-version is set to 2.1.

Information about the output HTML can be found here: https://www.w3.org/TR/its20/#its-tool-annotation

So an example of the output HTML looks like

 <p its:annotatorsRef="http://freme-project.eu/tools/freme-ner">bla bla bla</p>

@fsasaki Can you please check that this issue formulates the task correctly?

fsasaki commented 8 years ago

Thanks, Jan, this is mostly correct. Small nit: the output should be

<p its-annotators-ref="http://freme-project.eu/tools/freme-ner">bla bla bla</p>

The reason to have its-annotators-ref is that HTML attributes are case insensitive and HTML does not deal with XML namespaces (its:).

jnehring commented 8 years ago

ok good that you mention it, i did not read the documentation carefully enough.

fsasaki commented 8 years ago

One more thing: if there is a HTML document processed, it is sufficient to output the its-annotators-ref attribute at the outmost element.

katia-vistatec commented 8 years ago

I have tried this curl (locally) but I can't find annotatorsRef in the nif file:

curl -X POST --header "Content-Type: text/html" --data "@example.txt" --header "Accept: text/turtle" "http://localhost:8080/e-terminology/tilde?informat=text/html&outformat=text/turtle&source-lang=en&target-lang=nl&nif-version=2.1" > example-out.txt

katia-vistatec commented 8 years ago

With this input: example.txt

katia-vistatec commented 8 years ago

The same with this curl:

curl -X POST --header "Content-Type: text/html" --data "@example.txt" --header "Accept: text/turtle" "http://api-dev.freme-project.eu/current/e-terminology/tilde?informat=text/html&outformat=text/turtle&source-lang=en&target-lang=nl&nif-version=2.1" > example1-out.txt

katia-vistatec commented 8 years ago

Does e-Translation service work?I tries this curl and it does not work:

curl -X POST --header "Content-Type: text/html" --data "@example.txt" --header "Accept: text/turtle" "http://api-dev.freme-project.eu/current/e-translation/tilde?informat=text/html&outformat=text/turtle&source-lang=en&target-lang=de&nif-version=2.1" > example-tra-out.txt

katia-vistatec commented 8 years ago

{ "exception": "eu.freme.common.exception.ExternalServiceFailedException", "path": "/e-translation/tilde", "message": "External service failed: {\"Message\":\"System not available for requested language pair\"}", "error": "Not Found", "status": 404, "timestamp": 1475249555211 }

katia-vistatec commented 8 years ago

This should be implemented only for nif 2.1?

katia-vistatec commented 8 years ago

Ok. Now I can see in Tilde Terminology Service the annotation.

jnehring commented 8 years ago

Yes its:annotatorsRef is produced only when nif-version=2.1 is set.

katia-vistatec commented 8 years ago

I pushed the changes to repository. You can test.

jnehring commented 8 years ago

Thank you. It looks good. I enriched an HTML:

curl -X POST -H "Cache-Control: no-cache" -H "Postman-Token: 801956f7-7843-6557-06fb-7936c2cc387d" -d '<div>
    <p>Show me the source of knowledge</p>
    <p>Show me the source of knowledge</p>
</div>' "http://api-dev.freme-project.eu/current/e-terminology/tilde?informat=html&outformat=html&source-lang=en&target-lang=de&nif-version=2.1"

and it produced

<html>
    <head></head>
    <body>
        <div>
            <p its-ta-annotators-ref="https://services.tilde.com/terminology">
                <span its-term-info-ref="http://freme-project.eu/#offset_2_6" its-term="yes">Show</span> me the
                <span its-term-info-ref="http://freme-project.eu/#offset_14_20" its-term="yes">source</span> of
                <span its-term-info-ref="http://freme-project.eu/#offset_24_33" its-term="yes">knowledge</span>
            </p>
            <p its-ta-annotators-ref="https://services.tilde.com/terminology">
                <span its-term-info-ref="http://freme-project.eu/#offset_36_40" its-term="yes">Show</span> me the
                <span its-term-info-ref="http://freme-project.eu/#offset_48_54" its-term="yes">source</span> of
                <span its-term-info-ref="http://freme-project.eu/#offset_58_67" its-term="yes">knowledge</span>
            </p>
        </div>
    </body>
</html>

@fsasaki you mentioned that it is sufficient to put the annotation to the outmost element. Is the current output ok?

fsasaki commented 8 years ago

Hi Jan, sure, the output is OK. one could even have with your input the annotators-ref attribute at the div element:

<div its-ta-annotators-ref="https://services.tilde.com/terminology">

that is, instead of having the attributes on the p elements, one attribute at div would be sufficient.

katia-vistatec commented 8 years ago

This is an HTML snippet. For an HTML document I have put the attribute inside the html tag. Note that there is the problem of the tags introduced by the parser (for the html snippet): body, head and html. I should fix this.

fsasaki commented 8 years ago

This is an HTML snippet.

sure - in Jan's example call, the outmost element of the snippet was "div". One then can put the its-annotators-ref on "div". And you are right about a whole HTML element and the "html" tag.

katia-vistatec commented 8 years ago

Ok. So you mean that always it would be better to apply to the outmost element. I applied this concept only for the whole document. In the case of an HTML snippet I set the attribute to the tag directly containing the text but I can change this.

fsasaki commented 8 years ago

In the case of an HTML snippet I set the attribute to the tag directly containing the text but I can change this.

Yes, I would use the outmost element.

jnehring commented 8 years ago

This can get complicated in pipelines when there are multiple tools that produce different itsrdf:annotatorsRef. Finding the outmost element can be tricky in these cases.

I tried to test this but found a bug that hinders this test: #113

katia-vistatec commented 8 years ago

Is there a log of the error?

jnehring commented 8 years ago

Yes. I attached the error log to #113. So the problem seems to originate from within internationalization and not from within pipelines.

fsasaki commented 8 years ago

This can get complicated in pipelines when there are multiple tools that produce different itsrdf:annotatorsRef. Finding the outmost element can be tricky in these cases.

that is true. On the other hand, having its-annotators-ref on each element may blow up large documents a lot.

katia-vistatec commented 8 years ago

Can you test it again (for the case of HTML snippet)

jnehring commented 8 years ago

Now the output is

<html>
    <head></head>
    <body>
        <div its-ta-annotators-ref="https://services.tilde.com/terminology">
            <p>
                <span its-term-info-ref="http://freme-project.eu/#offset_2_6" its-term="yes">Show</span> me the
                <span its-term-info-ref="http://freme-project.eu/#offset_14_20" its-term="yes">source</span> of
                <span its-term-info-ref="http://freme-project.eu/#offset_24_33" its-term="yes">knowledge</span>
            </p>
            <p>
                <span its-term-info-ref="http://freme-project.eu/#offset_36_40" its-term="yes">Show</span> me the
                <span its-term-info-ref="http://freme-project.eu/#offset_48_54" its-term="yes">source</span> of
                <span its-term-info-ref="http://freme-project.eu/#offset_58_67" its-term="yes">knowledge</span>
            </p>
        </div>
    </body>
</html>

This looks good to me.

I leave the issue open so we can see how the system deals with different its-ta-annotators-ref annotations. This waits for #113.

katia-vistatec commented 8 years ago

Isn't issue https://github.com/freme-project/basic-services/issues/113 solved?

jnehring commented 8 years ago

no #113 is not solved.

jnehring commented 8 years ago

This issue is finished. The problem with the pipelines is not a problem of this task.