James-LG / Skyscraper

Rust library for scraping HTML using XPath expressions
MIT License
31 stars 4 forks source link

Two slashes html root element in xpath #5

Closed dyens closed 2 years ago

dyens commented 2 years ago

I have HTML:

    <ul>
        <li>Foo</li>
        <li>Bar</li>
        <li>Baz</li>
    </ul>

If i try to use find this xpath //ul/li - i have an error:

panicked at 'IndexSet: index out of bounds', src/xpath/mod.rs:376:36

The reason here is that ul - is a root element in html. If i use this html all should be fine:

<div>
    <ul>
        <li>Foo</li>
        <li>Bar</li>
        <li>Baz</li>
    </ul>
</div>

The code for reproduce:

    #[test]
    fn xpath_started_with_root_element() {
        let text = r#"
    <ul>
        <li>Foo</li>
        <li>Bar</li>
        <li>Baz</li>
    </ul>
"#;

        let document = html::parse(&text).unwrap();
        let xpath = xpath::parse("//ul/li[1]").unwrap();
        let results = xpath.apply(&document).unwrap();
        let el = results[0].get_text(&document);
        assert_eq!(el.unwrap(), "Foo");
    }
James-LG commented 2 years ago

I will look into this tomorrow.

James-LG commented 2 years ago

I fixed this today but unfortunately Canada is having a major internet outage and I can't push the changes yet (typing this from my phone). 😖