elixir-crawly / crawly

Crawly, a high-level web crawling & scraping framework for Elixir.
https://hexdocs.pm/crawly
Apache License 2.0
984 stars 116 forks source link

Optional XPath support #49

Closed Ziinc closed 4 years ago

Ziinc commented 4 years ago

Optional xpath support.

Should be able to handle dirty xml input.

Ziinc commented 4 years ago

@oltarasenko is this issue worth pursuing?

To add xpath querying, the only viable elixir library for that would be meeseeks, but it requires the rust compiler to be installed. The users that actually need xpath support would be advanced enough to add the library as a dep by them themself too

Sweet_xml doesn't handle dirty XML input well too, so it's out of the question.

I think leaving Floki as the built in parser is good enough, and that xpath is not necessary

oltarasenko commented 4 years ago

I think that at the end of the day we should not force Floki. E.g. now it's not a requirement (I am updating the tutorial to reflect it).

I think we should have one example with meeseeks usage. I will try to address it in the next few releases (e.g. a bit stuck with a splash example).

oltarasenko commented 4 years ago

@Ziinc I think we should not worry about Xpath. I have just played with the meeseeks. It seems to give what we need. I want to just close this issue.