eerohele / exalt

A Sublime Text plugin for validating and formatting XML documents
MIT License
22 stars 3 forks source link

Entities in external DTD are neglected #15

Open donum opened 5 years ago

donum commented 5 years ago

Hi,

thank you for that cool package. I don't feel misfortunate. :)

Issue-8 was reported and fixed which I am very happy about. This one is related though.

I noted that entity declarations within external DTDs are neglected.

(1) This works: messages.xml:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE catalogue SYSTEM "catalogue.dtd" [
  <!ENTITY nbsp "&#160;">
  <!ENTITY shy "&#173;">
  <!ENTITY reg "&#174;">
  <!ENTITY trade "&#8482;">
  <!ENTITY ndash "&#8211;">
  <!ENTITY mdash "&#8212;">
  <!ENTITY rsquo "&#8217;">
]>
<catalogue xml:lang="de" name="messages">
  <message key="banner.title">Hello&shy;World</message>
</catalogue>

catalogue.dtd:

<!ELEMENT catalogue (message)>
<!ATTLIST catalogue name ID #REQUIRED>
<!ELEMENT message (#PCDATA)>
<!ATTLIST message key ID #REQUIRED>

(2) While this does not work: messages.xml:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE catalogue SYSTEM "catalogue.dtd">
<catalogue xml:lang="de" name="messages">
  <message key="banner.title">Hello&shy;World!</message>
</catalogue>

catalogue.dtd:

<!ELEMENT catalogue (message)>
<!ATTLIST catalogue name ID #REQUIRED>
<!ELEMENT message (#PCDATA)>
<!ATTLIST message key ID #REQUIRED>
<!ENTITY nbsp "&#160;">
<!ENTITY shy "&#173;">
<!ENTITY reg "&#174;">
<!ENTITY trade "&#8482;">
<!ENTITY ndash "&#8211;">
<!ENTITY mdash "&#8212;">
<!ENTITY rsquo "&#8217;">

(3) And also this more advanced example doesn't work:

messages.xml:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE catalogue SYSTEM "catalogue.dtd">
<catalogue xml:lang="de" name="messages">
  <message key="banner.title">Hello&shy;World</message>
</catalogue>

catalogue.dtd:

<!ELEMENT catalogue (message)>
<!ATTLIST catalogue name ID #REQUIRED>
<!ELEMENT message (#PCDATA)>
<!ATTLIST message key ID #REQUIRED>
<!ENTITY % iso-lat1
    PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN//XML"
        "iso-lat1.ent">
%iso-lat1;
<!ENTITY % iso-lat2
    PUBLIC "ISO 8879:1986//ENTITIES Added Latin 2//EN//XML"
        "iso-lat2.ent">
%iso-lat2;

iso-lat1.ent:

...
<!ENTITY aacute "&#x00E1;"> <!-- LATIN SMALL LETTER A WITH ACUTE -->
<!ENTITY Aacute "&#x00C1;"> <!-- LATIN CAPITAL LETTER A WITH ACUTE -->
<!ENTITY acirc  "&#x00E2;"> <!-- LATIN SMALL LETTER A WITH CIRCUMFLEX -->
<!ENTITY Acirc  "&#x00C2;"> <!-- LATIN CAPITAL LETTER A WITH CIRCUMFLEX -->
<!ENTITY agrave "&#x00E0;"> <!-- LATIN SMALL LETTER A WITH GRAVE -->
<!ENTITY Agrave "&#x00C0;"> <!-- LATIN CAPITAL LETTER A WITH GRAVE -->
<!ENTITY aring  "&#x00E5;"> <!-- LATIN SMALL LETTER A WITH RING ABOVE -->
<!ENTITY Aring  "&#x00C5;"> <!-- LATIN CAPITAL LETTER A WITH RING ABOVE -->
<!ENTITY atilde "&#x00E3;"> <!-- LATIN SMALL LETTER A WITH TILDE -->
...

Error message is always "Entity &shy; not defined".

Would be very nice if at least the second example would work.

Dan

eerohele commented 5 years ago

Thanks for the detailed bug report!

Entities in external DTDs should definitely be supported. I'll try to look into this as soon as I'm over this pesky flu.

eerohele commented 5 years ago

v0.3.5 should fix the problem. When it appears in Package Control (there's usually a slight delay), could you give it a try and let me know whether it fixes the issue for you?

Note, though, that the example you posted still won't work as is: you must either specify the absolute path to catalogue.dtd or (preferably) use an XML catalog.

Exalt always operates on the contents of a Sublime Text view, which means it doesn't (and indeed can't) know the path where the XML document is stored. Also, if the document is unsaved, it has no path at all.

That means Exalt can't resolve catalogue.dtd because it doesn't know where it's located relative to the document it's validating.

donum commented 5 years ago

Super, thank you eerohele!

It works like a charm. Thank you also for your hint regarding the absolute path requirement.

Using the absolute system path makes it work very nicely.

I tried it this way to prevent me from storing my local system path: <!DOCTYPE catalogue SYSTEM "http://127.0.0.1:1338/static/xml-catalogue/dtd/catalogue.dtd">

That doesn't work though. Do you know, why? URL is accessible via the browser.

eerohele commented 5 years ago

I believe the issue is that lxml (which is what Exalt uses) doesn't load schemas over the network by default.

One option would be to enable network loading, but I'm not sure what sorts of doors for vulnerabilities I'd be opening Exalt to if I did that… although I think Exalt already loads certain resources over the network in some scenarios. I'll need to check.

I'll consider enabling network requests, but if you're looking to avoid using absolute file system paths to DTDs in your XML documents, have you considered using an XML catalog? I believe that'd be a better solution to the problem. Or maybe you're trying to solve some other problem?