causal-agent / scraper

HTML parsing and querying with CSS selectors
https://docs.rs/scraper
ISC License
1.79k stars 98 forks source link

Convert <br> to '\n' in `text`? #178

Closed failable closed 1 month ago

failable commented 3 months ago

The following code snippet outputs "Hello, world!good bye".

    use scraper::{Html, Selector};

    let fragment = Html::parse_fragment("<div>Hello, world!<br><p>good bye</p></div>");
    let selector = Selector::parse("div").unwrap();

    let div = fragment.select(&selector).next().unwrap();

    println!("{:#?}", div.text().collect::<String>());

Would it better to output "Hello, world!\ngood bye"?

adamreichold commented 3 months ago

If you want this particular transformation, you will need to replace the <br/> Element node by a \n Text node and then use .text().collect::<String>().

Alternatively, you can use .descendants() instead of .text() and filter for text nodes and <br/> tags directly.