Closed m8pple closed 3 years ago
This was incorrect (https://github.com/POETSII/Orchestrator/issues/201#issuecomment-844425443):
Though the suggested fix doesn't deal with completely valid things like handlers which are not in a CDATA section,
or multiple CDATA sections that should be appended.
Updated: this was incorrect due to the restriction that there is only one CDATA section, see https://github.com/POETSII/Orchestrator/issues/201#issuecomment-844425443.
It does mean that certain C code cannot be expressed in a handler section but that's not a big problem.
The idea of all text being held in a CDATA seems baked deep into the custom XML parser, for example at:
Looking around, I'm not sure if the parser does entity decoding properly either. i.e. if you don't have a CDATA
section, does it deal with &
and so on?
An interesting test would be whether it would handle code like x=a[b[i]]>3;
, which is valid code, but would
need to be weirdly escaped in a CDATA. So something like:
<Handler>
<![CDATA[
x=a[b[i]]
]]>
<![CDATA[
>3;
]]>
</Handler>
Normally this would be put outside a CDATA, something like:
<Handler>
x=a[b[i]] > 3;
</Handler>
Or a human would just be a space in, so ]]>
converted to ]] >
.
This is partially fixed by 90ecbf7ebfca74c374beb1e8e352f79e2c3ef879
I was partially talking crap, as a detailed reading of the spec says that if there is source code then it is in a CDATA section:
I think this language was in there specifically because people wanted to roll their own XML parser.
However, the fix suggested above still applies: it is valid for an element with no source code to not have a CDATA section.
We will make CDATA optional and look at your fix for the other issue.
The proposed patch from last month for making CDATA optional is included in #264
resolved in #264
When parsing the
v4
fileone-dev.xml
from the PEP20 repository, it complains about missingCDATA
sections.It also thinks it took 1.2 hours to do so :)
Explanation (mainly aimed at ADB)
CDATA sections are almost never required as part of a grammar, as they are intended to be mostly invisible to the generators and consumers of XML. They are just there to make escaping easier, so they are the equivalent of writing
versus
in python.
Or writing:
versus
in C++11.
From a compliant XML parsers point of view, the following should all be treated as equivalent:
<Element />
<Element></Element>
<Element><![CDATA[]]></Element>
<Element><![CDATA[]]><![CDATA[]]></Element>
They are all just an element with no children (either text or element). Only if you use deeper parsing APIs does one care whether you have a CDATA section, multiple CDATA sections, or none.If empty CDATA sections are required then it could become very tricky to generate valid files from languages like python and JavaScript, as they try to hide as much as possible. All you do is append string content to an element, and then the generator decides how to output it.
Fix
The easiest way of fixing this is to change
Config/V4Grammar3.ocfg
so that instead of:it has:
No idea if this causes problems later on, but it gets things parsing.