aantron / lambdasoup

Functional HTML scraping and rewriting with CSS in OCaml
https://aantron.github.io/lambdasoup
MIT License
380 stars 31 forks source link

Use a dedicated exception for selector parse errors. #31

Closed dmbaturin closed 4 years ago

dmbaturin commented 4 years ago

Right now, Soup.Selector.parse fails with a generic Failure if it cannot parse a selector. This requires a rather inelegant test that looks inside the message, and also makes it hard for clients to give the user a meaningful error message for a bad selector.

A separate exception (which I named Soup.Selector_parse_error) makes for a cleaner test suite and simpler error reporting.

It's also a compatibility concern. I glanced through packages that depend on lambdasoup (mechaml, socialpeek...) and it looks like none of them accept selector strings from the user, or rely on the current behaviour. Soupault seems to be the only lambdasoup client that allows user-defined selectors, and that's the reason I'm making the PR, but I believe web scraping libraries and similar can benefit from it if as well, if they decide to add flexibility and support custom selectors.

aantron commented 4 years ago

From the CI, it looks like the new exception is not exposed in the .mli file.

aantron commented 4 years ago

Thanks!

dmbaturin commented 4 years ago

What's your plan for the next release and its schedule?

aantron commented 4 years ago

I'm going to tweak the code a bit and release immediately.

dmbaturin commented 4 years ago

I see the opam-repository PR. I'm planning a soupault release for the end of February I think, so I can use 0.7.0 as a dependency then. After that I'm going to see if I can get #15 to work.