RazrFalcon / roxmltree

Represent an XML document as a read-only tree.
Apache License 2.0
434 stars 37 forks source link

Problem with text_storage as_str function - it causes a borrow? #117

Closed kavika13 closed 7 months ago

kavika13 commented 7 months ago

I'm not really very experienced with Rust yet, so forgive me if this is an obvious and trivial use of the borrow checker. I also think it might be a bug in the lifetime definitions of as_str?

I'm wondering why this compiles and works:

match node.tag_name().name() {
    s if s.eq_ignore_ascii_case("impl") => {
        match node.text_storage().unwrap() {
            roxmltree::StringStorage::Borrowed(s) => {
                test_borrow_xml.value = s;
            }
            _ => {}
        }
    }
    _ => {},
}

...but this causes a compile error that the document is borrowed and dropped?

match node.tag_name().name() {
    s if s.eq_ignore_ascii_case("impl") => {
        test_borrow_xml.value = node.text_storage().unwrap().as_str();
    }
    _ => {},
}

Here's a full test case that repros the problem:

fn main() {
    println!("Hello, world!");
}

#[derive(Debug)]
pub struct TestBorrowXml<'a> {
    pub value: &'a str,
}

impl<'a> TestBorrowXml<'a> {
    pub fn parse_xml(
        input_xml: &str,
    ) -> Result<TestBorrowXml, roxmltree::Error> {
        let opt = roxmltree::ParsingOptions {
            ..roxmltree::ParsingOptions::default()
        };
        let doc = roxmltree::Document::parse_with_options(&input_xml, opt)?;
        let mut test_borrow_xml = TestBorrowXml {
            value: "",
        };

        for node in doc.root_element().children() {
            if !node.is_element() {
                continue;
            }
            match node.tag_name().name() {
                s if s.eq_ignore_ascii_case("impl") => {
                    test_borrow_xml.value = node.text_storage().unwrap().as_str();
                    // match node.text_storage().unwrap() {
                    //     roxmltree::StringStorage::Borrowed(s) => {
                    //         test_borrow_xml.value = s;
                    //     }
                    //     _ => {}
                    // }
                }
                _ => {},
            }
        }

        Ok(test_borrow_xml)
    }
}

#[cfg(test)]
mod tests {
    use crate::TestBorrowXml;

    #[test]
    fn test_good_data() {
        let input_xml = "<Test><Impl>123</Impl></Test>";
        let test_borrow_xml = TestBorrowXml::parse_xml(&input_xml).unwrap();
        assert_eq!(test_borrow_xml.value, "123");
    }
}

And here's the resulting error message:

C:\dev\test_borrow_xml>cargo build
   Compiling test_borrow_xml v0.1.0 (C:\dev\test_borrow_xml)
error[E0515]: cannot return value referencing local variable `doc`
  --> src\main.rs:41:9
   |
23 |         for node in doc.root_element().children() {
   |                     --- `doc` is borrowed here
...
41 |         Ok(test_borrow_xml)
   |         ^^^^^^^^^^^^^^^^^^^ returns a value referencing data owned by the current function

For more information about this error, try `rustc --explain E0515`.
error: could not compile `test_borrow_xml` (bin "test_borrow_xml") due to 1 previous error

If I swap that commented code in, then it compiles and runs successfully.

kavika13 commented 7 months ago

This seems to work around the problem for me, for my use case:

pub trait StringStorageAsBorrowed<'input> {
    fn as_borrowed_str(&self) -> &'input str;
}

impl<'input> StringStorageAsBorrowed<'input> for StringStorage<'input> {
    fn as_borrowed_str(&self) -> &'input str {
        match self {
            StringStorage::Borrowed(s) => s,
            _ => panic!("The specified StringStorage was an owned string, not a borrowed string"),
        }
    }
}

// ...

test_borrow_xml.value = node.text_storage().unwrap().as_borrowed_str();

Out of curiosity, is this due to the choice to return &str on the interface and not COWs? https://github.com/RazrFalcon/roxmltree/issues/88

I seemed to have better luck with lifetimes with attribute nodes than with node inner text.

RazrFalcon commented 7 months ago

You are expected to copy the string or process it in-place (like parsing a number and stuff) or use a callback/closure.

Your code would not work, because you're trying to return a reference to a local/function-scope string. Of course it's not possible and Rust warns you about it. And the panic! hack allows you to cut those cases altogether, making borrow checker happy. But you would not be able to parse generic XML files like this.

Out of curiosity, is this due to the choice to return &str on the interface and not COWs?

We cannot return a COW, because it's owned. That wasn't a choice really.