disinfoRG / ZeroScraper

Web scraper made by 0archive.
https://0archive.tw
MIT License
10 stars 2 forks source link

make update_content_spider more flexible #49

Closed andreawwenyi closed 4 years ago

andreawwenyi commented 4 years ago

Currently, the update_content_spider will search for all articles that needed another snapshot. Some enhancements would be nice:

pm5 commented 4 years ago

This is good. I even think we should change our update routine from running update for all articles in one spider, to have multiple spiders, each is responsible for updating one site, like what we have in the discover routine. The advantages are:

andreawwenyi commented 4 years ago

Update: update 1 article is implemented in article.py and can be executed as $python article.py update {article_id} update articles from 1 site is implemented in site.py and can be executed as $python site.py update {site_id} (#57)