freme-project / basic-services

Apache License 2.0
0 stars 1 forks source link

[Pipelines] pipeline with XSLT step does not work #64

Closed fsasaki closed 8 years ago

fsasaki commented 8 years ago

I created this pipeline

http://api-dev.freme-project.eu/current/pipelining/templates/57

With the attached request I get an error. It looks like this is due to the first pipeline step. If I execute the first step not as a pipeline everything works fine. Why is that?

curl-xslt-pipeline.txt

ArneBinder commented 8 years ago

I put your request body directly in the pipeline and send it to http://api-dev.freme-project.eu/current/pipelining/chain (note that you have to json encode the body):

curl -X POST -H "Content-Type: application/json" -d '[
    {
      "method": "POST",
      "endpoint": "http://api-dev.freme-project.eu/current/toolbox/xslt-converter/documents/xliff20-to-html",
      "parameters": {},
      "headers": {
        "content-type": "text/xml",
        "accept": "text/html"
      },
      "body": "<xliff version=\"2.0\" xmlns=\"urn:oasis:names:tc:xliff:document:2.0\" srcLang=\"en\" trgLang=\"fr\">\n <file id=\"f1\">\n  <unit id=\"u1\">\n   <segment>\n   <source>We very much welcome you in the city of Prague, a home of XML!<\/source>\n   <\/segment>\n  <\/unit>\n <\/file>\n<\/xliff>"
    },
    {
      "method": "POST",
      "endpoint": "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents",
      "parameters": {
        "language": "en",
        "dataset": "dbpedia"
      },
      "headers": {
        "content-type": "text/html",
        "accept": "text/html"
      },
      "body": null
    },
    {
      "method": "POST",
      "endpoint": "http://api-dev.freme-project.eu/current/e-link/documents/",
      "parameters": {
        "templateid": "3"
      },
      "headers": {
        "content-type": "text/turtle",
        "accept": "text/turtle"
      },
      "body": null
    },
    {
      "method": "POST",
      "endpoint": "http://api-dev.freme-project.eu/current/e-terminology/tilde",
      "parameters": {
        "source-lang": "en",
        "target-lang": "nl"
      },
      "headers": {
        "content-type": "text/turtle",
        "accept": "text/turtle"
      },
      "body": null
    },
    {
      "method": "POST",
      "endpoint": "http://api-dev.freme-project.eu/current/e-translation/tilde",
      "parameters": {
        "source-lang": "en",
        "target-lang": "nl"
      },
      "headers": {
        "content-type": "text/turtle",
        "accept": "text/turtle"
      },
      "body": null
    }
  ]' "http://api-dev.freme-project.eu/current/pipelining/chain"

The error result is:

{
  "exception": "eu.freme.eservices.elink.exceptions.InvalidNIFException",
  "path": "/e-link/documents/",
  "message": "[line: 3, col: 7 ] Triples not terminated by DOT",
  "error": "Bad Request",
  "status": 400,
  "timestamp": 1468944828341
}

@fsasaki does this help in any way?

ArneBinder commented 8 years ago

@fsasaki the pipeline contains an error: freme-ner is called with Accept: text/html, but e-Link doesnt understand text/html.

fsasaki commented 8 years ago

Thanks - if i change the freme-ner call to Accept: text/turtle, it works with the pipeline that has the XLIFF document in the body of the pipeline. However, i don't get it to work by calling the pipeline separately, see attached CURL request.

curl-request-again.txt

fsasaki commented 8 years ago

I made another try, see attached CURL requests. xml-as-part-of-submitted-pipeline.txt Here the XML is part of the submitted pipeline body. Here the XSLT transformation works.

xml-via-stored-pipeline Here there is a pipeline with an ID, and the XML is in the submitted POST body. Here the transformation throws an error. Is there an error in the pipeline? See the pipeline template at http://api-dev.freme-project.eu/current/pipelining/templates/61

ArneBinder commented 8 years ago

I fixed the pipeline service: The internal mimeType (in and out) was set to turtle, so the xslt converter could not work. Furthermore, now it is possible to let the headers empty if you do not want to specify one. The Content-Type of the respone is taken as Content-Type for the next request.

I created pipeline 63. Try it:

curl -X POST -H "Content-Type: text/xml" -d '<xliff version="2.0" xmlns="urn:oasis:names:tc:xliff:document:2.0" srcLang="en" trgLang="fr">
 <file id="f1">
  <unit id="u1">
   <segment>
   <source>We very much welcome you in the city of Prague, a home of XML!</source>
   </segment>
  </unit>
 </file>
</xliff>' "http://api-dev.freme-project.eu/current/pipelining/chain/63"

Also pipeline 61 seems to work now, it tried it with the same input as above. Something to note: It looks like Postman sends the default Content-Type header text/plain and curl application/x-www-form-urlencoded if you do not set it. So I still get an error with pipeline 63 if Content-Type: text/xml is not set although it is set in the pipeline itself. This is because the pipeline service is configured to overwrite all headers/parameters of the pipeline with the submitted ones. Furthermore, I see one problem with the Pipeline service: PipelineService.chain implements some kind of round tripping by its own which is triggered if the input mimeType and output mimeType are both text/html. @jnehring I think this can cause problems for any pipeline using XSLT converter. Should we remove this magic? It also modifies the internal in-/outformats.

I added the pipelining endpoint to the e-internalization-blacklist of broker-dev and broker-local.

fsasaki commented 8 years ago

works, thanks!

2016-07-26 21:41 GMT+02:00 ArneBinder notifications@github.com:

I fixed the pipeline service: The internal mimeType (in and out) was set to turtle, so the xslt converter could not work. Furthermore, now it is possible to let the headers empty if you do not want to specify one. The Content-Type of the respone is taken as Content-Type for the next request.

I created pipeline 63 http://api-dev.freme-project.eu/current/pipelining/templates/63. Try it:

curl -X POST -H "Content-Type: text/xml" -d '

We very much welcome you in the city of Prague, a home of XML!

' "http://api-dev.freme-project.eu/current/pipelining/chain/63"

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/freme-project/basic-services/issues/64#issuecomment-235381627, or mute the thread https://github.com/notifications/unsubscribe-auth/ABH5Aol2n-5MEgY9GJk8tsEy09pNuZsHks5qZmL8gaJpZM4JPrvq .

fsasaki commented 8 years ago

In order to work with various XML formats I need to create a pipeline like this:

  1. XSLT XLIFF2HTML with input / output types text/xml
  2. e-Internationalisation with input / output text/html
  3. XSLT HTML2XLIFF with input / output text/xml

Is that possible or will the pipeline after step two try to set the input type to text/html? @ArneBinder and @jnehring , what do you think?

jnehring commented 8 years ago

Is that possible or will the pipeline after step two try to set the input type to text/html?

I think it should be possible. When you explicitly set an input type then the pipeline should not override this setting.

fsasaki commented 8 years ago

Thanks, I'll give it a try and open another issue if there are problems.

ArneBinder commented 8 years ago

@fsasaki you have to set the output types (accept header). The input type will be taken from the response of the previous request.

ArneBinder commented 8 years ago

As mentioned above at the moment it is just not possible to have a pipeline which accepts and produces html but uses xslt-converter to convert from html to xml and back, because of pipeline's internal roundtripping implementation.

fsasaki commented 8 years ago

Thanks, Arne. When or in general could the pipeline internal roundtripping be changed to allow for this functionality?

2016-07-27 11:42 GMT+02:00 ArneBinder notifications@github.com:

As mentioned above at the moment it is just not possible to have a pipeline which accepts and produces html but uses xslt-converter to convert from html to xml and back, because of pipeline's internal roundtripping implementation.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/freme-project/basic-services/issues/64#issuecomment-235537857, or mute the thread https://github.com/notifications/unsubscribe-auth/ABH5ArdOoN97S0RWRmMmQi49RTsbfHt-ks5qZyftgaJpZM4JPrvq .

ArneBinder commented 8 years ago

When or in general could the pipeline internal roundtripping be changed to allow for this functionality?

We could remove the roundtrip functionality from the pipeline and parameterize the internalisation-filter: say we introduce a parameter roundtrip which can switch the default behaviour of the internalisation-filter, i.e. enable roundtripping with roundtrip=true also for blacklisted endpoints. @jnehring what do you say?

jnehring commented 8 years ago

Lets discuss when you are back in the office.

fsasaki commented 8 years ago

I will be off next week. If you have time to continue work on this, attached is some data.

e-internationalisation is on hold because of the bug that @katia-vistatec will work on. The pipeline itself is on hold because of the content type forwarding issue discussed here.

output-step2-borken.txt output-step2-ideal.txt input-step1.txt output-step1.txt output-step3.txt input.txt pipeline.txt

ArneBinder commented 8 years ago

The pipeline itself is on hold because of the content type forwarding issue discussed here.

I think for this usecase there is no problem with the pipeline service, but with with XSLT Converter (html-to-xliff20). I tried to change the html parser validation policy level, but it reports still parsing errors. Furthermore, there was an error in the internalization-filter, which caused wrong response content types when doing roundtripping, so the wrong mime type was send to the last request (convert html to xliff).

I created some pipelines:

Outputs:

<html>
    <head>
        <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
        <title>@@@</title>
        <script type="application/xml">&lt;xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0" srcLang="en" trgLang="fr">
 &lt;file id="f1">
  &lt;unit id="u1">
   &lt;segment>
   &lt;anchor xmlns="http://www.w3.org/1999/xhtml" id="n1">&lt;/anchor>
   &lt;/segment>
  &lt;/unit>
 &lt;/file>
&lt;/xliff></script>
    </head>
    <body>
        <div id="xyz1xyz">
            <p id="n1">We very much welcome you in the city of 
                <span data-its-ta-class-refs="http://dbpedia.org/ontology/City http://dbpedia.org/ontology/Location http://dbpedia.org/ontology/PopulatedPlace http://nerd.eurecom.fr/ontology#Location http://dbpedia.org/ontology/Place http://dbpedia.org/ontology/Settlement" its-ta-class-ref="http://dbpedia.org/ontology/City" its-ta-confidence="0.9990764748481397" its-ta-ident-ref="http://dbpedia.org/resource/Prague">Prague</span>, a home of
                <span its-ta-class-ref="http://www.w3.org/2002/07/owl#Thing" its-ta-confidence="0.9962033954052527" its-ta-ident-ref="http://dbpedia.org/resource/XML">XML</span>!
            </p>
        </div>
    </body>
</html>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></meta>
        <title>@@@</title>
        <script type="application/xml">
            <xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0" srcLang="en" trgLang="fr">
                <file id="f1">
                    <unit id="u1">
                        <segment>
                            <anchor xmlns="http://www.w3.org/1999/xhtml" id="n1"></anchor>
                        </segment>
                    </unit>
                </file>
            </xliff>
        </script>
    </head>
    <body>
        <div id="xyz1xyz">
            <p id="n1">We very much welcome you in the city of Prague, a home of XML!</p>
        </div>
    </body>
</html>
ArneBinder commented 8 years ago

I opened #71 for the xslt-converter html parser bug.

fsasaki commented 8 years ago

Thanks, @ArneBinder . On

This looks a bit strange, see the not escaped ...

e.g.

<file id="f1">

That is ok, XSLT 3.0 can handle this.

On the issue with

{ "method": "POST", "endpoint": "http://api-dev.freme-project.eu/current/toolbox/xslt-converter/documents/xliff20-to-html", "parameters": {}, "headers": { "accept": "text/html" }, "body": null }

With

"accept": "text/html"

the e-internationalisation (I think) adds a meta tag that does not have a closing tag. If the accept header is text/xml, this does not happen. But if the accept header is text/xml, the pipeline does not work either, see http://api-dev.freme-project.eu/current/pipelining/templates/68

fsasaki commented 8 years ago

pipeline.zip

I got this to work with e-internationalisation now producing the right output, see attachment. I am using in step one the output type text/xml - here then the parser does not create the <meta> tag that later creates the parsing error (in step 3).

If it would be possible to have a pipeline that takes as input text/xml and produces as output text/xml, and then in the next step processes that output as text/html, things may work.

ArneBinder commented 8 years ago

If it would be possible to have a pipeline that takes as input text/xml and produces as output text/xml, and then in the next step processes that output as text/html, things may work.

OK, I implemented this workaround by relaxing the mimeType forwarding: it is possible to overwrite it in the pipeline. Just a warning is logged if the mime types are not the same.

@fsasaki I created pipeline 69.

So can we close this?

fsasaki commented 8 years ago

Well done, thanks @ArneBinder ! Yes, we can close this.