Closed yoavweiss closed 11 years ago
Got a script (2 actually) that downloads the Alexa CSV, cleans it up and starts downloading the HTMLs using a multi-threaded pool (so timeouts will be less painful) It saves the HTML with a "txt" extension and also saves the headers.
Got a script (2 actually) that downloads the Alexa CSV, cleans it up and starts downloading the HTMLs using a multi-threaded pool (so timeouts will be less painful) It saves the HTML with a "txt" extension and also saves the headers.