2021-08-02 09:45:23 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://nltimes.nl/2020/02/26/coronavirus-authorities-fear-german-tourist-brought-covid-19-netherlands>: HTTP status code is not handled or not allowed
2021-08-02 09:45:25 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://eu.indystar.com/story/news/health/2020/05/01/indiana-reopening-timeline-coronavirus-pandemic/3059275001/>: HTTP status code is not handled or not allowed
2021-08-02 09:45:38 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.thedailybeast.com/twitter-deleted-sheriff-clarkes-wildly-reckless-coronavirus-tweets-so-he-says-hes-going-to-parler?source=articles&via=rss&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+thedailybeast
%2Farticles+%28The+Daily+Beast+-+Latest+Articles%29>: HTTP status code is not handled or not allowed
2021-08-02 09:45:40 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.businessinsider.com/patient-thanks-medical-workers-hospital-window-note-critical-care-2020-3?utm_source=feedburner&%3Butm_medium=referral&utm_medium=feed&utm_campaign=Feed%3A+businessinsider+%28Business+Inside
r%29>: HTTP status code is not handled or not allowed
2021-08-02 09:45:51 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://nationalpost.com/pmn/health-pmn/toyota-extends-shutdown-of-north-american-plants-through-april-17>: HTTP status code is not handled or not allowed
I think this is because someone requested the links to be taken down from IA.
We should try to download these pages from the original sites using newspaper3k, and output them in the same format as the rest.
Some links are excluded from the IA, e.g.,
I think this is because someone requested the links to be taken down from IA.
We should try to download these pages from the original sites using newspaper3k, and output them in the same format as the rest.