egonSchiele / HandsomeSoup

Easy HTML parsing for Haskell
http://egonschiele.github.com/HandsomeSoup
BSD 3-Clause "New" or "Revised" License
124 stars 20 forks source link

HTTP Redirects #21

Open abhillman opened 10 years ago

abhillman commented 10 years ago

Hi! Thanks for the great package. Sometimes, of course, HTTP servers return a 3xx code (e.g. 302) indicating that a resource exists at another location. For example, Wikipedia's random page link (http://en.wikipedia.org/wiki/Special:Random) can give something like:

HTTP/1.1 302 Found
Server: Apache
X-Content-Type-Options: nosniff
Vary: Accept-Encoding,X-Forwarded-Proto,Cookie
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Location: http://en.wikipedia.org/wiki/Spilarctia_whiteheadi
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
X-Varnish: 875388210, 2663556169, 3260756627
Via: 1.1 varnish, 1.1 varnish, 1.1 varnish
Content-Length: 20
Accept-Ranges: bytes
Date: Wed, 18 Jun 2014 05:52:32 GMT
Age: 0
Connection: keep-alive
X-Cache: cp1054 miss (0), cp4016 miss (0), cp4018 frontend miss (0)
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate

I couldn't seem to be able to find a way to allow for redirects when using the fromUrl function. Is there an alternative? Thanks!

egonSchiele commented 10 years ago

Hi there, The long-term solution is that the fromUrl function needs to be fixed to account for redirects. Unfortunately I don't have time to do this right now, but you can download a webpage into a string yourself and then use parseHtml to parse it into a tree.