ContentMine / quickscrape

A scraping command line tool for the modern web
MIT License
259 stars 42 forks source link

Multiple URLs: handle missing URLs well #32

Open blahah opened 9 years ago

blahah commented 9 years ago

Currently the whole system starts acting unpredictably if a URL doesn't resolve

lanzer commented 9 years ago

Not just multiple URL, I noticed that quickscrape halts when any page that results in 404. Starts to become a problem in conjunction with scripts that follow links.

For example: ./quickscrape.js -u https://github.com/ContentMine/quickscrape/iss -s github.json