disinfoRG / ZeroScraper

Web scraper made by 0archive.
https://0archive.tw
MIT License
10 stars 2 forks source link

some sites load too long during discover process #117

Open andreawwenyi opened 4 years ago

andreawwenyi commented 4 years ago

Got the following failures during discover process on middle2.

2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET http://www.tailian.org.cn/> (failed 1 times): User timeout caused connection failure: Getting http://www.tailian.org.cn/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://www.ettoday.net/> (failed 2 times): User timeout caused connection failure: Getting https://www.ettoday.net/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://www.thenewslens.com/> (failed 2 times): User timeout caused connection failure: Getting https://www.thenewslens.com/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://www.ctwant.com/> (failed 2 times): User timeout caused connection failure: Getting https://www.ctwant.com/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://yxfashao.com/> (failed 2 times): User timeout caused connection failure: Getting https://yxfashao.com/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://ilovestory.net/> (failed 2 times): User timeout caused connection failure: Getting https://ilovestory.net/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://www.bomb01.com/> (failed 2 times): User timeout caused connection failure: Getting https://www.bomb01.com/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://www.ptt01.cc/> (failed 2 times): User timeout caused connection failure: Getting https://www.ptt01.cc/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://www.twreporter.org/> (failed 2 times): User timeout caused connection failure: Getting https://www.twreporter.org/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://www.mirrormedia.mg/> (failed 2 times): User timeout caused connection failure: Getting https://www.mirrormedia.mg/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://fongnews.com/> (failed 2 times): User timeout caused connection failure: Getting https://fongnews.com/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://news.ebc.net.tw/> (failed 2 times): User timeout caused connection failure: Getting https://news.ebc.net.tw/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://www.readr.tw/> (failed 2 times): User timeout caused connection failure: Getting https://www.readr.tw/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://taronews.tw/> (failed 2 times): User timeout caused connection failure: Getting https://taronews.tw/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://www.upmedia.mg/> (failed 2 times): User timeout caused connection failure: Getting https://www.upmedia.mg/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://www.toutiao.com/api/pc/realtime_news/> (failed 2 times): User timeout caused connection failure: Getting https://www.toutiao.com/api/pc/realtime_news/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://www.xuehua.tw/> (failed 2 times): User timeout caused connection failure: Getting https://www.xuehua.tw/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://www.teepr.com/> (failed 2 times): User timeout caused connection failure: Getting https://www.teepr.com/ took longer than 180.0 seconds..
2020-05-09 04:37:59 scrapy.downloadermiddlewares.retry DEBUG: Retrying <GET https://fongnews.net/> (failed 2 times): User timeout caused connection failure: Getting https://fongnews.net/ took longer than 180.0 seconds..