Open dyst5422 opened 4 years ago
Thanks for the PR to start the discussion. I'm interesting in learning more about some of the use-cases you have in mind. A few thoughts in response to your questions:
If I had to guess, pre-order traversals would be more commonly used than post-order. I say this mostly because pre-order is generally how my brain reads XML files in an editor (with my eyes seeing the parent element before the children).
I'm inclined to only provide traversal on Element
. I don't have a good sense of real-world usage, but I'm guessing that most users only work with Elements
.
Can you say some more about how you envision the stop predicate being used in practice? I certainly can see the value in having an ability in controlling the traversal. But I had always assumed that most xml files used with xmltree-rs
would be small enough to fully hold in memory, and small enough that traversing the entire tree would not be too expensive.
Agreed on pre-order being the only one necessary - I don't think XML is often treated as a graph tree in practice.
I imagine that traversals on XMLNode would allow you to do something like
let uppercased_comments = root_xml_node.traverse(|xml_node| match xml_node {
XMLNode::Comment(comment) => true,
_ => false,
}).map(|XMLNode::Comment(comment)| comment.to_uppercase())
I think it becomes more interesting when combining it with the writing back out - allowing intermediate transformations on the XML while preserving the parts that are not of concern (comments, etc).
let some_elements = xml_node.traverse(|node| {
if let XMLNode::Element(element) = node {
return element.name == 'SomeElement';
}
return false;
}).filter(|node| {
if let XMLNode::Element(element) = node {
return element.name == 'SomeElement';
}
return false;
});
Although now that I write it out (and since its probably the most common use case) this is incredibly bad to write. There should probably just be a traverse()
method returning an iterator and a get_children(predicate: fn(&XMLNode) -> bool) ->Vec<&XMLNode>
method that does the retrieval for you.
I have a related method I threw together for modifying a tree that might similarly useful
pub fn traverse_and_process<I: Iterator<Item=XMLNode>>(
xml_tree: I,
processor: fn(&XMLNode) -> XMLNode,
) -> Vec<XMLNode> {
let mut new_tree = Vec::new();
for node in xml_tree {
new_tree.push(match node {
XMLNode::CData(_) => processor(&node),
XMLNode::Comment(_) => processor(&node),
XMLNode::ProcessingInstruction(_, _) => processor(&node),
XMLNode::Text(_) => processor(&node),
XMLNode::Element(element) => {
let mut new_element = element.clone();
new_element.children = traverse_and_process(element.children.into_iter(), processor);
processor(&XMLNode::Element(new_element))
}
})
}
new_tree
}
Also not sure how you feel about returning Iterators vs Vectors. I like the completeness and simplicity of Vectors, but the usability and implied performance benefits of Iterators in terms of performing intermediate transformations/filters/maps/etc.
In this example you wrote:
let uppercased_comments = root_xml_node.traverse(|xml_node| match xml_node {
XMLNode::Comment(comment) => true,
_ => false,
}).map(|XMLNode::Comment(comment)| comment.to_uppercase())
I like your other suggestion better of using a traverse()
:
let uppercased_comments = root_xml_node.traverse()
.filter(|node| node.as_comment().is_some())
.map(|XMLNode::Comment(comment)| comment.to_uppercase())
Where .traverse()
is a pre-order depth-first traversal iterator. Or maybe we can just implement Iterator
on Element
(BTW, this example of getting all comments is compelling to me, so I now agree we should support traversal over XMLNode
s)
I guess I still don't fully understand the point of something like traverse_and_process
, since that looks a whole lot like root_xml_node.traverse().map(...).collect()
With something like traverse_and_process()
instead of traverse()
, you preserve the structure of the tree.
Did a second pass at it here which allows removing/changing nodes
pub fn traverse_and_process<I: Iterator<Item=XMLNode>>(
xml_tree: I,
processor: fn(&XMLNode) -> Option<XMLNode>,
) -> Vec<XMLNode> {
let mut new_tree = Vec::new();
for node in xml_tree {
let new_node_option = match node {
XMLNode::CData(_) => processor(&node),
XMLNode::Comment(_) => processor(&node),
XMLNode::ProcessingInstruction(_, _) => processor(&node),
XMLNode::Text(_) => processor(&node),
XMLNode::Element(element) => {
let mut new_element = element.clone();
new_element.children = traverse_and_process(element.children.into_iter(), processor);
processor(&XMLNode::Element(new_element))
}
}
if let Some(new_node) = new_node_option {
new_tree.push(new_node)
}
}
new_tree
}
Just to confirm: did you mean to close this PR?
No
This adds traits and implementations to do depth first traversals (both pre-order and post-order). It's still rough and needs some work to organize the api and code, but it should provide some good utility to finding nodes in the tree efficiently.
Definitely still needs discussion before it would be ready for consideration for merging.
Additionally moved the parse and parse_all functions into the module scope rather than scoped to Element (though they are still on Element and delegated to the module scope functions to keep backwards compatibility). <- This needs to be moved into another PR I think.
I think some additional questions to be answered: