Closed ChillyVanilly77 closed 1 year ago
My guess is these are text node children of the camera
element which correspond to the indentation in your example (camera
itself is not EmtpyElemTag
).
Oh, so if >= 1 whitespace character(s) is/are detected right after a tag, it is interpreted/considered as a TextNode? Hmm...interesting, that I did not consider. But this is desired and spec-adhereing behaviour then, is it?
It is indeed the indentation:
// Using roughly the same markup...
dbg!(camera_elem.children().filter(|n| n.is_text()).map(|n| n.text()).collect::<Vec<_>>());
// => [
// Some(
// "\n\t\t",
// ),
// Some(
// "\n\t\t",
// ),
// Some(
// "\n\t\t",
// ),
// Some(
// "\n\t\t",
// ),
// Some(
// "\n\t\t",
// ),
// Some(
// "\n\t\t",
// ),
// Some(
// "\n\t",
// ),
// ]
To some, this whitespace is annoying and insignificant, but AFAIK there is no such thing as "insignificant whitespace" as far as the XML spec is concerned. From the same link you provided:
An XML processor MUST always pass all characters in a document that are not markup through to the application.
And as you noted, roxmltree exhibits this conforming behavior by providing these characters in a text node.
As others mentioned, this is the correct behavior. Unlike the popular believe, there are no indention in XML. All of those leading spaces are text nodes.
Some parsers might replaces multiple whitespaces with one, accounting the xml:space
attribute, but a text node remains.
Same with comments between elements.
What you want is #91, which hopefully will be added soon.
Afaik, whitespaces trimming in XML is domain specific. An XML library can't handle it on its own. For example, an SVG like this:
<text>
Text
</text>
should be parsed as Text
. No spaces and new lines at all.
But something like this:
<text>
Some
<tspan>text</tspan>
</text>
should be parsed as Some text
. Now with a space. Why? Because XML.
Ah yes, I was just about to ask regarding Node::children_elements
for convenience,
since this may not be apparent to some.
Although I did read ~40% of the XML spec and used it for anchor-tag parsing with RegEx, I do not know it "by heart" or as details as you :))
Thank you all for the amazing responses, infos and clarifications! Cheers!
In the end, I think the right way to do this is to use:
for element in node.children().filter(Node::is_element) {
println!("{:?}", element.tag_name());
}
Sure, it's a bit verbose, but much more flexible. The main idea behind roxmltree
is that it doesn't follow web/js DOM api and instead relies heavily on Rust's Iterator
.
It's even mentioned in the readme, but even I somehow forgot about it...
We even have a dedicated example just for your case. I completely forgot about it. Well, it was written 4 year ago.
Yes yes I did notice that, though it seems at times as if it was supposed to be a flexible mix of raw node access & convenience where needed :) But yeah, I went with that option (as was commented out in the code snippet) ^^
Hi!
I just ran a unit-test while parsing some XML file and I noticed something weird which I am seeing as possibly incorrect behaviour, but that is where I wish for you to tell me whether that is the case or not please.
The XML snippet in question:
code snippet in question:
Failed test-assertion response of mine:
Since there is no
NodeType::Attribute
(which of course could be that this isn't even meant to exist according to the XML specification, but I don't know that at this point), I was a bit confused, what would those empty string-slices refer to exactly? And since thecamera
's children are allEmptyElemTag
(as specified here), there should not be Text-nodes, right?Cheers!