Webdevdata / webdevdata.org

Website for reports, etc.
44 stars 7 forks source link

HTML downloading script #7

Closed yoavweiss closed 11 years ago

yoavweiss commented 11 years ago

Got a script (2 actually) that downloads the Alexa CSV, cleans it up and starts downloading the HTMLs using a multi-threaded pool (so timeouts will be less painful) It saves the HTML with a "txt" extension and also saves the headers.