egonSchiele / HandsomeSoup

Easy HTML parsing for Haskell
http://egonschiele.github.com/HandsomeSoup
BSD 3-Clause "New" or "Revised" License
124 stars 20 forks source link

Support HTTPS #31

Open singpolyma opened 8 years ago

singpolyma commented 8 years ago

Most sites require it these days.

egonSchiele commented 8 years ago

I agree, this really needs to be supported. I don't have time to do it myself, but I'm happy to merge any PRs you or anyone else wants to throw my way.

metalbot commented 5 years ago

Do you have any strong opinions on how you'd want this done? It looks like we'd be switching dependencies in order to get HTTPS support, and there are a couple of different directions we could go. I'd be most inclined to use http-client-tls since that looks like path of least resistance.

singpolyma commented 5 years ago

Do you have any strong opinions on how you'd want this done?...I'd be most inclined to use http-client-tls since that looks like path of least resistance.

I'm partial to the io-streams ecosystem, but not picky.

metalbot commented 5 years ago

Supporting https for the openUrl helper is pretty straightforward (albeit ugly since we're now dealing with ByteString and String). I don't see a clean non-breaking way to make readDocument work since it's coming from HXT.

https://github.com/metalbot/HandsomeSoup/commit/3974e4e4b8045b1ea79adc3770cfb9960d4a99dd

metalbot commented 5 years ago

So, it turns out that we can make readDocument work if we're willing to switch from withHttp to withCurl and take on a new dependency for hxt-curl. I'm not sure if this change creates problems for people on Windows, but I can try to test later this week.

https://github.com/metalbot/HandsomeSoup/commit/62b5c3bef91d45a6bee4a28307f9d26b8f2cd9c6

metalbot commented 5 years ago

Ugh. Definitely a problem for Windows without a non-trivial install of libcurl. I don't have a better way forward since I have low confidence we could get hxt to accept a pull request. Do you want me to just do a pull request for the update on the openUrl, or does that put us in too much of a "half-done" state? Given that most of the web has moved to TLS, I'm not sure that "half-done" isn't a better place than "not-done".

221V commented 5 years ago

fatal error: http error when requesting URI "https://...": https not supported (perhaps server does not understand HTTP/1.1)

i install latest version (HandsomeSoup == 0.4.2) from hackage

can you fix that please? thanks

221V commented 5 years ago

okey, https://github.com/metalbot/HandsomeSoup/tree/https works with https

but there are other bug

metalbot commented 5 years ago

Are the other bugs related to my fork, or just other bugs in general?

I'm really hesitant to do a pull request here since the dependency on libcurl is non-trivial to resolve for Windows.

221V commented 5 years ago

Are the other bugs related to my fork, or just other bugs in general?

i try use your https tree, and i think bug is general https://github.com/egonSchiele/HandsomeSoup/issues/32 lib lost hmtl pieces ((

221V commented 5 years ago

i am sorry, bug was in other place