fimad / scalpel

A high level web scraping library for Haskell.
Apache License 2.0
323 stars 43 forks source link

Add generalized repetition #23

Closed rpglover64 closed 8 years ago

rpglover64 commented 8 years ago

matchAll :: Scraper a b -> Scraper a [b], generalizing htmls, attrs, texts, and chroots from their singular forms.

This is useful in case I want to matchAll (html "a" <* (attr "title" "a" >>= \x -> guard (somePredicate x))).

(Yes, I actually ran into this.)

many doesn't solve the problem because it's alternation; replicateM and friends don't solve the problem because each Scraper looks from the current spot.

fimad commented 8 years ago

How would matchAll work if you passed different selectors to html and attr?

It seems like it may not make sense with different selectors and that you would need something with a type like :: Selectable s => (s -> Scraper str a) -> (s -> Scraper str b) -> s -> Scraper s [(a, b)] so that you can ensure that you are selecting on the same elements. Of course this doesn't generalize as nicely :/

Another, though less intuitive option, is to abuse chroots and Any:

chroots "a" $ do
    x <- attr "title" Any
    guard (somePredicate x)
    html Any
rpglover64 commented 8 years ago

How would matchAll work if you passed different selectors to html and attr?

Presumably, it would return the empty list, since no single element has both selectors.

I hadn't thought of the chroots trick; I think it should be added to the chroots documentation, and that may be enough.

fimad commented 8 years ago

Closing out for now, if you feel like the current documentation isn't extensive or visible enough please reopen.