Closed MoSal closed 1 year ago
Can you provide an example?
I mean, just:
<xml></xml>
or
<xml1></xml1>
The standard states that:
Names beginning with the string "xml", or with any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification.
I thought this means such names shouldn't be allowed.
W3 schools own tutorial mentions:
- Element names are case-sensitive
- Element names must start with a letter or underscore
- Element names cannot start with the letters xml (or XML, or Xml, etc)
- Element names can contain letters, digits, hyphens, underscores, and periods
- Element names cannot contain spaces
But I tried Python's stdlib impl, and also Firefox's DOM parser, and both don't care. So, maybe I understood the standard wrong, or this part of it is just generally ignored!
Not sure what's right for xmlparser/roxmltree here, but I will just disallow this from my side to be extra strict.
lxml
parses it just fine, so I guess this is not a bug.
If unsure, try using roxmltree/testing-tools/lxml-ast.py
. I'm following its logic.
The XML spec is a convoluted mess which no one follows, so figuring out what is right or wrong is mostly impossible.
roxmltree
/xmlparser
simply mimics lxml
/libxml2
behaviour.
Hello.
Names starting with xml (irregardless of casing) do not produce a parsing error.
I don't know if this belongs here or in
roxmltree
, but I'm reporting here sinceXmlCharExt
is a part of this crate.I was writing a xml name checker for a custom derive crate to catch invalid names at compile time. But then I stumbled into this issue while testing.