disinfoRG / ZeroScraper

Web scraper made by 0archive.
https://0archive.tw
MIT License
10 stars 2 forks source link

Handle 404 for article snapshot update #78

Closed pm5 closed 4 years ago

pm5 commented 4 years ago

Stop taking snapshots if hit 404 by setting next_snapshot_at to 0 as in other cases. Also made spiders and db pipeline better separated.

pm5 commented 4 years ago

A few things added besides fixing #69: