2020-02-18 02:30:11 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.ptt.cc/bbs/HatePolitics/M.1579175212.A.E0F.html>: HTTP status code is not handled or not allowed
2020-02-18 02:30:11 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 http://www.lookerpets.com/post12214491091063>: HTTP status code is not handled or not allowed
2020-02-18 02:30:11 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 http://www.readthis.one/post11207421092250>: HTTP status code is not handled or not allowed
2020-02-18 02:30:11 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 http://www.taiwan.cn/plzhx/wyrt/201912/t20191226_12228090.htm>: HTTP status code is not handled or not allowed
2020-02-18 02:30:17 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.dcard.tw/_api/posts/232973648/>: HTTP status code is not handled or not allowed
2020-02-18 02:30:17 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.ptt.cc/bbs/Gossiping/M.1579322008.A.66B.html>: HTTP status code is not handled or not allowed
2020-02-18 02:30:22 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.ptt.cc/bbs/HatePolitics/M.1579177593.A.EF7.html>: HTTP status code is not handled or not allowed
2020-02-18 02:30:22 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 http://www.readthis.one/post11088161096342>: HTTP status code is not handled or not allowed
2020-02-18 02:30:22 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.cna.com.tw/news/firstnews/201912100147.aspx>: HTTP status code is not handled or not allowed
2020-02-18 02:30:22 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 http://www.taiwan.cn/plzhx/wyrt/201912/t20191226_12228091.htm>: HTTP status code is not handled or not allowed
The update logic currently will keep checking these URLs for new snapshots, I think? We should ignore these updates when they have accumulated certain numbers, say 3, of 404 errors.
There are some URLs with HTTP 404 error:
The update logic currently will keep checking these URLs for new snapshots, I think? We should ignore these updates when they have accumulated certain numbers, say 3, of 404 errors.