Open traceflight opened 3 days ago
Hi @traceflight ,
We're still working on implementing conformance to the XML spec, I don't think we have any plans in the immediate term to accept XML that is not well formed.
That being said, according to the XSLT 4.0 spec, there will be a new parse-html function, so definitely at some stage we will need to implement a (likely separate) very forgiving parser that can handle those documents.
There is also the parse-xml-fragment() that will need to be implemented in future, which may also suit your needs once done.
A document with multiple root tags (elements) is not well-formed, but is acceptable as an external general entity. This is specific to the XML parser; the transformation engine is able to use any tree, and other parsers (such as JSON, Markdown, etc) may produce trees with mutiple top-level elements.
Also, xrust needs to be able to produce a document that has multiple top-level elements so that it can be used as an external general entity.
Perhaps we need an alternate XML parser entry point that relaxes the wellformedness rule?
As described in the examples of parse-xml-fragment(), may be we can creat a new root node when parsing some kinds of not well-formed documents?
Examples
The expression fn:parse-xml-fragment("<alpha>abcd</alpha><beta>abcd</beta>") returns a newly created document node, having two elements named alpha and beta as its children; each of these elements in turn is the parent of a text node.
The expression fn:parse-xml-fragment("He was <i>so</i> kind") returns a newly created document node having three children: a text node whose string value is "He was ", an element node named i having a child text node with string value "so", and a text node whose string value is " kind".
So why don't we just implement the parsing and serialisation functions from XPath 3.1, section 14.7? Doing so would be part of our plan anyway.
Another handy feature would be the ability to start the transformation by invoking a named template (or user-defined function?). This template could then use the fn:parse-xml-fragment function to read in the non-well-formed document.
If this approach is acceptable then I will add it to the project schedule (a.k.a. "wish list")
Implementing parsing and serialisation functions is the better way for me.
This has been added to the Version 2.0 project. https://github.com/users/ballsteve/projects/1/views/1?pane=issue&itemId=88931764
Thank you
Hi, thanks for your great job. I am searching a rust xpath library and find xrust.
After some test, i find it can not parse xml with multiple root tags since many other tools like cyber chef can handle this.
I am wondering if you have plan to support this.