Avoiding restart of commoncrawl scraping process

Mandatory

[x] I read the documentation (readme and wiki).
[x] I searched other issues (including closed issues) and could not find any to be related. If you find related issues post them below or directly add your issue to the most related one.

Describe your question Trying to download ccnews articles that fall under a certain filtering requirement (added my own filters, that do stuff like process and predict likelihood of language being in English, etc.). However, because there are so many articles, it's unlikely for me to have my job complete before it's interrupted. When I start the process back up, I'm not sure whether the articles I had downloaded previously are being redownloaded, or it's starting back up where it left off before it was terminated. If it's the former, any workaround for making sure things don't get redownloaded every time the process starts up again?

fhamborg / news-please

Avoiding restart of commoncrawl scraping process #228