ballsteve / xrust

XPath, XQuery, and XSLT for Rust
Apache License 2.0
91 stars 9 forks source link

Parse and search in xml with multiple root tags #125

Open traceflight opened 3 days ago

traceflight commented 3 days ago

Hi, thanks for your great job. I am searching a rust xpath library and find xrust.

After some test, i find it can not parse xml with multiple root tags since many other tools like cyber chef can handle this.

<a> aaa </a>
<b> bbb </b>

I am wondering if you have plan to support this.

Devasta commented 3 days ago

Hi @traceflight ,

We're still working on implementing conformance to the XML spec, I don't think we have any plans in the immediate term to accept XML that is not well formed.

That being said, according to the XSLT 4.0 spec, there will be a new parse-html function, so definitely at some stage we will need to implement a (likely separate) very forgiving parser that can handle those documents.

There is also the parse-xml-fragment() that will need to be implemented in future, which may also suit your needs once done.

ballsteve commented 3 days ago

A document with multiple root tags (elements) is not well-formed, but is acceptable as an external general entity. This is specific to the XML parser; the transformation engine is able to use any tree, and other parsers (such as JSON, Markdown, etc) may produce trees with mutiple top-level elements.

Also, xrust needs to be able to produce a document that has multiple top-level elements so that it can be used as an external general entity.

Perhaps we need an alternate XML parser entry point that relaxes the wellformedness rule?

traceflight commented 2 days ago

As described in the examples of parse-xml-fragment(), may be we can creat a new root node when parsing some kinds of not well-formed documents?

Examples

The expression fn:parse-xml-fragment("<alpha>abcd</alpha><beta>abcd</beta>") returns a newly created document node, having two elements named alpha and beta as its children; each of these elements in turn is the parent of a text node.

The expression fn:parse-xml-fragment("He was <i>so</i> kind") returns a newly created document node having three children: a text node whose string value is "He was ", an element node named i having a child text node with string value "so", and a text node whose string value is " kind".
ballsteve commented 2 days ago

So why don't we just implement the parsing and serialisation functions from XPath 3.1, section 14.7? Doing so would be part of our plan anyway.

Another handy feature would be the ability to start the transformation by invoking a named template (or user-defined function?). This template could then use the fn:parse-xml-fragment function to read in the non-well-formed document.

If this approach is acceptable then I will add it to the project schedule (a.k.a. "wish list")

traceflight commented 2 days ago

Implementing parsing and serialisation functions is the better way for me.

ballsteve commented 2 days ago

This has been added to the Version 2.0 project. https://github.com/users/ballsteve/projects/1/views/1?pane=issue&itemId=88931764

traceflight commented 2 days ago

Thank you