dynamohuang / amazon-scrapy

Scrapy the detail and lowest price of amazon best seller product by python spider
286 stars 123 forks source link

User timeout caused connection failure. #6

Open Jorigorn opened 5 years ago

Jorigorn commented 5 years ago

Don't know if it is because of the Amazon has detected our bot and block the IP?

But https://www.amazon.com/best-sellers-video-games/zgbs/videogames/?aja x=1&pg=3 Indeed doesn't existed, there is no page 3 there .

https://www.amazon.com/Best-Sellers-Sports-Outdoors/zgbs/sporting-go%20ods/?ajax=1&pg=2 is correct, I can open it with chrome browser.

How can I set up the proxy? Because of this error, it will lose all the data even already got some data from previous pages?

twisted.internet.error.TimeoutError: User timeout caused connection failure. 2018-11-19 23:40:32 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.amazon.com/best-sellers-video-games/zgbs/videogames/?aja x=1&pg=3> Traceback (most recent call last): File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_r equest defer.returnValue((yield download_func(request=request,spider=spider))) scrapy.core.downloader.handlers.http11.TunnelError: Could not open CONNECT tunnel with proxy 46.38.52.36:8081 [{'status': 400, 'reason': b'B ad Request'}] 2018-11-19 23:40:36 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.amazon.com/Best-Sellers-Sports-Outdoors/zgbs/sporting-go ods/?ajax=1&pg=2> Traceback (most recent call last): File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks result = result.throwExceptionIntoGenerator(g) File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/twisted/python/failure.py", line 491, in throwExceptionIntoG enerator return g.throw(self.type, self.value, self.tb) File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_r equest defer.returnValue((yield download_func(request=request,spider=spider))) File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/handlers/http11.py", line 320, in _cb _timeout raise TimeoutError("Getting %s took longer than %s seconds." % (url, timeout)) twisted.internet.error.TimeoutError: User timeout caused connection failure: Getting https://www.amazon.com/Best-Sellers-Sports-Outdoors/zgb s/sporting-goods/?ajax=1&pg=2 took longer than 30.0 seconds.. 2018-11-19 23:41:51 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.amazon.com/best-sellers-software/zgbs/software/?ajax=1&p g=2> Traceback (most recent call last): File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_r equest defer.returnValue((yield download_func(request=request,spider=spider))) twisted.internet.error.TimeoutError: User timeout caused connection failure. (1030, 'Got error 168 from storage engine') total spent: 0:52:23.652052 done

dynamohuang commented 5 years ago

add a proxy.json file in the amazon/amazon. like this ["198.52.39.104:3128", "31.207.5.155:3128"]