freme-project / basic-services

Apache License 2.0
0 stars 1 forks source link

[xslt-converter] fix html parsing #71

Closed ArneBinder closed 8 years ago

ArneBinder commented 8 years ago

This call:

curl -X POST -H "Content-Type: text/html" -H "Accept: text/xml" -d '<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></meta>
        <title>@@@</title>
        <script type="application/xml">
            <xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0" srcLang="en" trgLang="fr">
                <file id="f1">
                    <unit id="u1">
                        <segment>
                            <anchor xmlns="http://www.w3.org/1999/xhtml" id="n1"></anchor>
                        </segment>
                    </unit>
                </file>
            </xliff>
        </script>
    </head>
    <body>
        <div id="xyz1xyz">
            <p id="n1">We very much welcome you in the city of Prague, a home of XML!</p>
        </div>
    </body>
</html>' "http://api-dev.freme-project.eu/current/toolbox/xslt-converter/documents/html-to-xliff20"

produces:

{
  "exception": "eu.freme.common.exception.FREMEHttpException",
  "path": "/toolbox/xslt-converter/documents/html-to-xliff20",
  "message": "The XML parser reported two validation errors",
  "error": "Internal Server Error",
  "status": 500,
  "timestamp": 1469715449796
}

with validator.nu validation policy level: ALLOW.

ArneBinder commented 8 years ago

I switched from html parser nu.validator to tagsoup. Now html-to-xliff20 works as aspected.