cpatrickalves / scraping-ebay

Scraping Ebay's products using Scrapy Web Crawling Framework
MIT License
117 stars 40 forks source link

Empty results for ebay.de #4

Closed ghost closed 3 years ago

ghost commented 4 years ago

Hey, great work. I've created a new spider for ebay Germany (ebay.de) and I don't get any results.

Here are the changes I've made for the new spider compared to the original one for ebay.com:

name = "ebay_de" allowed_domains = ["ebay.de"] start_urls = ["https://www.ebay.de"] ... yield scrapy.Request("http://www.ebay.de/sch/i.html?_from=R40&_trksid=" + trksid + "&_nkw=" + self.search_string.replace(' ','+') + "&_ipg=200", callback=self.parse_link)

Input scrapy crawl ebay_de -o products_de.csv -a search="MacBook Pro 13 2016"

Output 2019-11-23 22:50:17 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: scraping_ebay) 2019-11-23 22:50:17 [scrapy.utils.log] INFO: Versions: lxml 4.4.1.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 2.7.17rc1 (default, Oct 10 2019, 10:26:01) - [GCC 9.2.1 20191008], pyOpenSSL 19.1.0 (OpenSSL 1.1.1c 28 May 2019), cryptography 2.6.1, Platform Linux-5.3.0-23-generic-x86_64-with-Ubuntu-19.10-eoan 2019-11-23 22:50:17 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'scraping_ebay.spiders', 'FEED_FORMAT': 'csv', 'SPIDER_MODULES': ['scraping_ebay.spiders'], 'FEED_URI': 'products_de.csv', 'BOT_NAME': 'scraping_ebay'} 2019-11-23 22:50:17 [scrapy.extensions.telnet] INFO: Telnet Password: 383b88df45692b23 2019-11-23 22:50:17 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.corestats.CoreStats'] 2019-11-23 22:50:17 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2019-11-23 22:50:17 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2019-11-23 22:50:17 [scrapy.middleware] INFO: Enabled item pipelines: [] 2019-11-23 22:50:17 [scrapy.core.engine] INFO: Spider opened 2019-11-23 22:50:17 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2019-11-23 22:50:17 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2019-11-23 22:50:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.de> (referer: None) 2019-11-23 22:50:18 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.ebay.de/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=MacBook+Pro+13+2016&_ipg=200> from <GET http://www.ebay.de/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=MacBook+Pro+13+2016&_ipg=200> 2019-11-23 22:50:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.de/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=MacBook+Pro+13+2016&_ipg=200> (referer: None) 2019-11-23 22:50:20 [ebay_de] DEBUG: eBay products collected successfully !!! 2019-11-23 22:50:20 [scrapy.core.engine] INFO: Closing spider (finished) 2019-11-23 22:50:20 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 1327, 'downloader/request_count': 3, 'downloader/request_method_count/GET': 3, 'downloader/response_bytes': 108803, 'downloader/response_count': 3, 'downloader/response_status_count/200': 2, 'downloader/response_status_count/301': 1, 'elapsed_time_seconds': 2.400946, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 11, 23, 21, 50, 20, 273904), 'log_count/DEBUG': 4, 'log_count/INFO': 10, 'memusage/max': 54169600, 'memusage/startup': 54169600, 'request_depth_max': 1, 'response_received_count': 2, 'scheduler/dequeued': 3, 'scheduler/dequeued/memory': 3, 'scheduler/enqueued': 3, 'scheduler/enqueued/memory': 3, 'start_time': datetime.datetime(2019, 11, 23, 21, 50, 17, 872958)} 2019-11-23 22:50:20 [scrapy.core.engine] INFO: Spider closed (finished) products_de.csv is empty

Thanks!

cpatrickalves commented 4 years ago

Hi OLLIWOODX, Sorry the delay to answer.

I've not tested your changes in the current version of the spider, but I think It does not work because the scraping is based on page structure, HTML tags, CSS styling, etc.

So, any change in the page source (change in an HTML tag ID, Class, name, etc.) will make the spider fails in scraping.

You need to go deeper into the ebay.de HTML pages and update the xpaths in the spider.

kexonator commented 3 years ago

Hi @cpatrickalves i think the issue has been fixed, maybe ebay was rolling out changes on the US-market that have not been implemented on the german market back then. Using the spider definition provided by @OLLIWOODX works perfectly fine now. So i think the issue may be closed

Btw - great work!