fimad / scalpel

A high level web scraping library for Haskell.
Apache License 2.0
323 stars 43 forks source link

would like to select based on the text of a node #25

Closed gregnwosu closed 8 years ago

gregnwosu commented 8 years ago

try as I might I havent been able to select based on the text, further more the text primitive seems to concatenate all text nodes beneath the current node. Any ways around this?

rpglover64 commented 8 years ago

Check out the response to issue #23

chroots "p" $ do
  t <- text Any
  guard (isInfixOf "would" t)
  html Any

selects paragraphs whose text contains "would".

fimad commented 8 years ago

Does @rpglover64's comment provide what you need?

As for concatenating all text nodes, I'm not sure what a reasonable alternative would be. For example, if you have <a> foo <b> bar </b> baz </a> there isn't a single text node within the <a> tag. And it seems like it would be very unintuitive to only return foo baz in this case.

fimad commented 8 years ago

Closing this out. The chroots trick @rpglover64 provided is now documented in the README.