SWI-Prolog / packages-sgml

The SWI-Prolog SGML/XML/HTML parser
5 stars 9 forks source link

Improve warning when BOM and XML encoding attribute conflict #14

Closed wouterbeek closed 8 years ago

wouterbeek commented 8 years ago

I have an XML file with the following first line (notice the UTF-16 BOM):

<U+FEFF><?xml version="1.0" encoding="UTF-8" ?>

This is clearly a buggy file, but sgml_parse/2 gives a warning that is quite misleading:

Warning: <FILE>:16:
    SGML2PL(sgml): []:1: #PCDATA ("") not allowed here

Maybe the waning should be something like "Conflict between BOM and XML encoding attribute"?

JanWielemaker commented 8 years ago

The problem is that it doesn't get that far. It enables UCS-2 (should be UTF-16) decoding and than the entire <?xml ... is no longer readable. So, it just sees CDATA and that is not allowed before the first element. At a higher level you could retry bij seeking back to the start and change the encoding to a superset of ASCII (e.g., UTF-8) and try to see whether you get a sensible document opening.

wouterbeek commented 8 years ago

Ok, will try that. Thanks!