TeamHG-Memex / scrapy-crawl-once

Scrapy middleware which allows to crawl only new content
MIT License
79 stars 23 forks source link

add CRAWL_ONCE_RESET setting #5

Open SteveSmirnoff opened 3 years ago

SteveSmirnoff commented 3 years ago

Aimed to fix #4 Added setting similar to DELTAFETCH_RESET

Expected usage: in settings.py: CRAWL_ONCE_RESET = True or in terminal: scrapy crawl spider_name -a crawl_once_reset=True

If True, SqliteDict.clear() is called on the db