cpatrickalves / scraping-ebay

Scraping Ebay's products using Scrapy Web Crawling Framework
MIT License
117 stars 40 forks source link

products.json output is always empty #1

Closed chatzich closed 5 years ago

chatzich commented 5 years ago

Hello nice work! I tried to test it with several inputs but there are problems output is not working

scrapy crawl ebay -o products.json -a search="Samsung galaxy s7"

I take the result

2019-04-18 08:37:48 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: scraping_ebay) 2019-04-18 08:37:48 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.0, Python 3.4.8 (default, Feb 5 2018, 11:23:17) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)], pyOpenSSL 19.0.0 (OpenSSL 1.1.0h 27 Mar 2018), cryptography 2.3, Platform Linux-3.10.0-957.10.1.el7.x86_64-x86_64-with-centos-7.6.1810-Core 2019-04-18 08:37:48 [scrapy.crawler] INFO: Overridden settings: {'ROBOTSTXT_OBEY': True, 'BOT_NAME': 'scraping_ebay', 'NEWSPIDER_MODULE': 'scraping_ebay.spiders', 'SPIDER_MODULES': ['scraping_ebay.spiders'], 'FEED_FORMAT': 'json', 'FEED_URI': 'products.json'} 2019-04-18 08:37:48 [scrapy.extensions.telnet] INFO: Telnet Password: 4fde495aabaaad3c 2019-04-18 08:37:48 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2019-04-18 08:37:48 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2019-04-18 08:37:48 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2019-04-18 08:37:48 [scrapy.middleware] INFO: Enabled item pipelines: [] 2019-04-18 08:37:48 [scrapy.core.engine] INFO: Spider opened 2019-04-18 08:37:48 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2019-04-18 08:37:48 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2019-04-18 08:37:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/robots.txt> (referer: None) 2019-04-18 08:37:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com> (referer: None) 2019-04-18 08:37:51 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=Samsung+galaxy+s7&_ipg=200> from <GET http://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=Samsung+galaxy+s7&_ipg=200> 2019-04-18 08:37:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=Samsung+galaxy+s7&_ipg=200> (referer: None) 2019-04-18 08:37:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=2> (referer: https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=Samsung+galaxy+s7&_ipg=200) 2019-04-18 08:37:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=3> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=2) 2019-04-18 08:37:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=4> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=3) 2019-04-18 08:38:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=5> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=4) 2019-04-18 08:38:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=6> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=5) 2019-04-18 08:38:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=7> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=6) 2019-04-18 08:38:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=8> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=7) 2019-04-18 08:38:09 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=9> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=8) 2019-04-18 08:38:11 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=10> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=9) 2019-04-18 08:38:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=11> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=10) 2019-04-18 08:38:15 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=12> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=11) 2019-04-18 08:38:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=13> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=12) 2019-04-18 08:38:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=14> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=13) 2019-04-18 08:38:21 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=15> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=14) 2019-04-18 08:38:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=16> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=15) 2019-04-18 08:38:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=17> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=16) 2019-04-18 08:38:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=18> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=17) 2019-04-18 08:38:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=19> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=18) 2019-04-18 08:38:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=20> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=19) 2019-04-18 08:38:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=21> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=20) 2019-04-18 08:38:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=22> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=21) 2019-04-18 08:38:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=23> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=22) 2019-04-18 08:38:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=24> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=23) 2019-04-18 08:38:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=25> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=24) 2019-04-18 08:38:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=26> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=25) 2019-04-18 08:38:48 [scrapy.extensions.logstats] INFO: Crawled 28 pages (at 28 pages/min), scraped 0 items (at 0 items/min) 2019-04-18 08:38:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=27> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=26) 2019-04-18 08:38:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=28> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=27) 2019-04-18 08:38:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=29> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=28) 2019-04-18 08:38:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=30> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=29) 2019-04-18 08:38:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=31> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=30) 2019-04-18 08:39:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=32> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=31) 2019-04-18 08:39:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=33> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=32) 2019-04-18 08:39:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=34> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=33) 2019-04-18 08:39:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=35> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=34) 2019-04-18 08:39:11 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=36> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=35) 2019-04-18 08:39:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=37> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=36) 2019-04-18 08:39:13 [ebay] DEBUG: eBay products collected successfully !!! 2019-04-18 08:39:13 [scrapy.core.engine] INFO: Closing spider (finished) 2019-04-18 08:39:13 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 29140, 'downloader/request_count': 40, 'downloader/request_method_count/GET': 40, 'downloader/response_bytes': 3222681, 'downloader/response_count': 40, 'downloader/response_status_count/200': 39, 'downloader/response_status_count/301': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 4, 18, 12, 39, 13, 928618), 'log_count/DEBUG': 41, 'log_count/INFO': 10, 'memusage/max': 164339712, 'memusage/startup': 47673344, 'request_depth_max': 37, 'response_received_count': 39, 'robotstxt/request_count': 1, 'robotstxt/response_count': 1, 'robotstxt/response_status_count/200': 1, 'scheduler/dequeued': 39, 'scheduler/dequeued/memory': 39, 'scheduler/enqueued': 39, 'scheduler/enqueued/memory': 39, 'start_time': datetime.datetime(2019, 4, 18, 12, 37, 48, 939034)} 2019-04-18 08:39:13 [scrapy.core.engine] INFO: Spider closed (finished) products.json is empty!!!

cpatrickalves commented 5 years ago

Hi @ironexmaiden !!

I have tested and you're right.

I'm pretty busy right now, but I'll take some time time to fix the code.

Give me 10 days.

chatzich commented 5 years ago

No! Don't do it I have it fixed you have to change the tag results = response.xpath('//li[@class="s-item "]') to results = response.xpath('//li[@class="s-item "]') we must also fix the check of the next_page_url it fails too :) I have the fixes if you give me a pull requests I can push them Regards Christos

Στις Δευ, 22 Απρ 2019 στις 8:16 π.μ., ο/η Patrick Alves < notifications@github.com> έγραψε:

Hi @ironexmaiden https://github.com/ironexmaiden !!

I have tested and you're right.

I'm pretty busy right now, but I'll take some time time to fix the code.

Give me 10 days.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cpatrickalves/scraping-ebay/issues/1#issuecomment-485404273, or mute the thread https://github.com/notifications/unsubscribe-auth/AABOJW3CG6BJJETYZZ7TPW3PRWUCBANCNFSM4HG4RTCQ .

cpatrickalves commented 5 years ago

That's great! Thanks! You can make a pull request;

jfulk commented 5 years ago

I also tried running this with a CSV output and that was also empty.

cpatrickalves commented 5 years ago

@jfulk I've just push a new version with a fix for these issues.