Shinmera / plump

Practically Lenient and Unimpressive Markup Parser for Common Lisp
https://shinmera.github.io/plump
zlib License
119 stars 21 forks source link

<?xml attributes have wrong order in some lisp implementations (version has to be first) #34

Closed VaclavSynacek closed 3 years ago

VaclavSynacek commented 3 years ago

The xml declaration in first line of xml documents must contain version as the first attribute and may contain other attributes afterwards. At least some xml parsers are sensitive to this and will not process xml that does not have version first.

plump implements the xml-header attributes as hashmap. Order of hashmap content is not defined by the standard and therefore not too surprisingly different cl implementations order xml declaration attributes differently.

as an example lets have this:

(plump:serialize
  (plump:parse
    "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
     <foo><bar this is=\"a thing\">baz</bar><span id=\"test\">oh my"))

sbcl and ccl returned xml version first (probably fifo), all ok here, at least with this input.

On the other hand ecl, clisp and abcl returned (probably alphabetical order):

<?xml encoding="UTF-8" version="1.0"?>
     <foo><bar is="a thing" this="">baz</bar><span id="test">oh my</span></foo>

I have read parts of plump source. But I am not sure at which level this should be fixed. It could probably be implemented similar as doctype, which has one slot of :type string, but that feels wrong. Or maybe it could be changed for all attributes to use alist/plist instead of hashmap? This would fix the roundtriping also for other tags (in the example is and this is also switched), but probably nobody cares for the normal tags. For most xml documents using alist/plist should not kill performance, but I don't know why you chose hashmap in the first place.

Shinmera commented 3 years ago

At this point it's far too late to switch away from using hash tables. The only fix for this is special-casing the XML header and manually forcing the version to appear first in its printer method.

VaclavSynacek commented 3 years ago

Ok, I tried that. The pull request works on my inputs, but please check I have not broken something else. I am especially not sure about specializing serialize-object for single attribute using cons. Thanks.