ContentMine / quickscrape

A scraping command line tool for the modern web
MIT License
259 stars 42 forks source link

Skip already-downloaded articles #70

Open robintw opened 8 years ago

robintw commented 8 years ago

It'd be great if quickscrape had an option to skip downloading articles if they have already been downloaded.

A very simple version of this could just be that if the folder it was going to download stuff into already exists then just skip and move on to the next one - and I might have a go at that for a PR.

tarrow commented 8 years ago

That sounds like a good idea to me. We're super excited to have this kind of offer :). Perhaps you could add it not as the default behaviour but enable it with a flag?

On Tue, Mar 22, 2016 at 9:03 PM, Robin Wilson notifications@github.com wrote:

It'd be great if quickscrape had an option to skip downloading articles if they have already been downloaded.

A very simple version of this could just be that if the folder it was going to download stuff into already exists then just skip and move on to the next one - and I might have a go at that for a PR.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/ContentMine/quickscrape/issues/70