eminence / xmltree-rs

Reads an XML file into a simple tree structure
MIT License
39 stars 31 forks source link

[FEAT] Depth First Traversal Methods #19

Open dyst5422 opened 4 years ago

dyst5422 commented 4 years ago

This adds traits and implementations to do depth first traversals (both pre-order and post-order). It's still rough and needs some work to organize the api and code, but it should provide some good utility to finding nodes in the tree efficiently.

Definitely still needs discussion before it would be ready for consideration for merging.

Additionally moved the parse and parse_all functions into the module scope rather than scoped to Element (though they are still on Element and delegated to the module scope functions to keep backwards compatibility). <- This needs to be moved into another PR I think.

I think some additional questions to be answered:

eminence commented 4 years ago

Thanks for the PR to start the discussion. I'm interesting in learning more about some of the use-cases you have in mind. A few thoughts in response to your questions:

dyst5422 commented 4 years ago
  1. Agreed on pre-order being the only one necessary - I don't think XML is often treated as a graph tree in practice.

  2. I imagine that traversals on XMLNode would allow you to do something like

    let uppercased_comments = root_xml_node.traverse(|xml_node| match xml_node {
    XMLNode::Comment(comment) => true,
    _ => false,
    }).map(|XMLNode::Comment(comment)| comment.to_uppercase())

I think it becomes more interesting when combining it with the writing back out - allowing intermediate transformations on the XML while preserving the parts that are not of concern (comments, etc).

  1. Stop predicate I see as being the way to pull out certain elements
    let some_elements = xml_node.traverse(|node| {
    if let XMLNode::Element(element) = node {
    return element.name == 'SomeElement';
    }
    return false;
    }).filter(|node| {
    if let XMLNode::Element(element) = node {
    return element.name == 'SomeElement';
    }
    return false;
    });

Although now that I write it out (and since its probably the most common use case) this is incredibly bad to write. There should probably just be a traverse() method returning an iterator and a get_children(predicate: fn(&XMLNode) -> bool) ->Vec<&XMLNode> method that does the retrieval for you.

I have a related method I threw together for modifying a tree that might similarly useful

pub fn traverse_and_process<I: Iterator<Item=XMLNode>>(
    xml_tree: I,
    processor: fn(&XMLNode) -> XMLNode,
) -> Vec<XMLNode> {
    let mut new_tree = Vec::new();
    for node in xml_tree {
        new_tree.push(match node {
            XMLNode::CData(_) => processor(&node),
            XMLNode::Comment(_) => processor(&node),
            XMLNode::ProcessingInstruction(_, _) => processor(&node),
            XMLNode::Text(_) => processor(&node),
            XMLNode::Element(element) => {
                let mut new_element = element.clone();
                new_element.children = traverse_and_process(element.children.into_iter(), processor);
                processor(&XMLNode::Element(new_element))
            }
        })
    }
    new_tree
}

Also not sure how you feel about returning Iterators vs Vectors. I like the completeness and simplicity of Vectors, but the usability and implied performance benefits of Iterators in terms of performing intermediate transformations/filters/maps/etc.

eminence commented 4 years ago

In this example you wrote:

let uppercased_comments = root_xml_node.traverse(|xml_node| match xml_node {
  XMLNode::Comment(comment) => true,
  _ => false,
}).map(|XMLNode::Comment(comment)| comment.to_uppercase())

I like your other suggestion better of using a traverse():

let uppercased_comments = root_xml_node.traverse()
    .filter(|node| node.as_comment().is_some())
    .map(|XMLNode::Comment(comment)| comment.to_uppercase())

Where .traverse() is a pre-order depth-first traversal iterator. Or maybe we can just implement Iterator on Element

(BTW, this example of getting all comments is compelling to me, so I now agree we should support traversal over XMLNodes)

I guess I still don't fully understand the point of something like traverse_and_process, since that looks a whole lot like root_xml_node.traverse().map(...).collect()

dyst5422 commented 4 years ago

With something like traverse_and_process() instead of traverse(), you preserve the structure of the tree.

Did a second pass at it here which allows removing/changing nodes

pub fn traverse_and_process<I: Iterator<Item=XMLNode>>(
    xml_tree: I,
    processor: fn(&XMLNode) -> Option<XMLNode>,
) -> Vec<XMLNode> {
    let mut new_tree = Vec::new();
    for node in xml_tree {
        let new_node_option = match node {
            XMLNode::CData(_) => processor(&node),
            XMLNode::Comment(_) => processor(&node),
            XMLNode::ProcessingInstruction(_, _) => processor(&node),
            XMLNode::Text(_) => processor(&node),
            XMLNode::Element(element) => {
                let mut new_element = element.clone();
                new_element.children = traverse_and_process(element.children.into_iter(), processor);
                processor(&XMLNode::Element(new_element))
            }
        }
        if let Some(new_node) = new_node_option {
          new_tree.push(new_node)
        }
    }
    new_tree
}
eminence commented 4 years ago

Just to confirm: did you mean to close this PR?

dyst5422 commented 4 years ago

No