Closed yamadapc closed 5 years ago
There currently isn't a way to do this today. A lot of scalpel's internals assume that all the data you care about is contained within a single sub-tree of the entire HTML document.
I've opened up #48 as a sort of meta-issue to solve the general problem of selecting multiple sub-trees. If you have an ideas for what a good API would look like please post them there :)
This is now supported in version 0.6.0
. This specific issue is added as a regression test:
, scrapeTest
"Issue #41 regression test"
"<p class='something'>Here</p><p>Other stuff that matters</p>"
(Just "Other stuff that matters")
(inSerial $ do
seekNext $ matches $ "p" @: [hasClass "something"]
stepNext $ text "p")
In libraries like
jQuery
/cheerio
, given an HTML document like:You can select "Other stuff that matters" with a selector like:
.something+p
.This structure, while not my cup of tea, is used every now and then on websites such as http://hackage.haskell.org.
Is there a way to do this?