Closed sajjadakram2018 closed 3 years ago
I want to mention that I try it with "", but still the same
scrapy crawl immoscout -o apartments.csv -a url="https://www.immobilienscout24.de/Suche/S-T/Wohnung-Miete/Berlin/Berlin/-/2,50-/60,00-/EURO--1000,00" -L INFO
BTW your tool is super cool
You retrieve an HTTP error 405. The reason is that Immoscout now uses captchas to protect from scraping their website. At the moment there is no workaround for this issue, see also https://github.com/asmaier/ImmoSpider/issues/9
I followed the instruction that is mentioned in Readme. In the Simple scraping step is mentioned the output of the following command should have the list of the apartments in Berlin in apartments.csv. But, the output file is an empty file. Did I miss something? I copy the log of the command for a better understanding.
Thanks Sajjad
$ scrapy crawl immoscout -o apartments.csv -a url=https://www.immobilienscout24.de/Suche/S-T/Wohnung-Miete/Berlin/Berlin/-/2,50-/60,00-/EURO--1000,00 -L INFO 2021-01-19 13:41:20 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: immospider) 2021-01-19 13:41:20 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.8.6 (default, Sep 25 2020, 09:36:53) - [GCC 10.2.0], pyOpenSSL 20.0.1 (OpenSSL 1.1.1f 31 Mar 2020), cryptography 3.0, Platform Linux-5.8.0-36-generic-x86_64-with-glibc2.32 2021-01-19 13:41:20 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'immospider', 'LOG_LEVEL': 'INFO', 'LOG_STDOUT': True, 'NEWSPIDER_MODULE': 'immospider.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['immospider.spiders']} 2021-01-19 13:41:20 [scrapy.extensions.telnet] INFO: Telnet Password: 4a1f6f3d22013ab8 2021-01-19 13:41:20 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats', 'immospider.extensions.SendMail'] 2021-01-19 13:41:20 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2021-01-19 13:41:20 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2021-01-19 13:41:20 [scrapy.middleware] INFO: Enabled item pipelines: ['immospider.pipelines.GooglemapsPipeline', 'immospider.pipelines.DuplicatesPipeline'] 2021-01-19 13:41:20 [scrapy.core.engine] INFO: Spider opened 2021-01-19 13:41:20 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2021-01-19 13:41:20 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2021-01-19 13:41:20 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://www.immobilienscout24.de/Suche/S-T/Wohnung-Miete/Berlin/Berlin/-/2,50-/60,00-/EURO--1000,00>: HTTP status code is not handled or not allowed 2021-01-19 13:41:20 [scrapy.core.engine] INFO: Closing spider (finished) 2021-01-19 13:41:20 [immospider.extensions] INFO: No new items found. No email sent. 2021-01-19 13:41:20 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 910, 'downloader/request_count': 2, 'downloader/request_method_count/GET': 2, 'downloader/response_bytes': 18012, 'downloader/response_count': 2, 'downloader/response_status_count/200': 1, 'downloader/response_status_count/405': 1, 'elapsed_time_seconds': 0.256602, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2021, 1, 19, 12, 41, 20, 624024), 'httperror/response_ignored_count': 1, 'httperror/response_ignored_status_count/405': 1, 'log_count/INFO': 12, 'memusage/max': 57483264, 'memusage/startup': 57483264, 'response_received_count': 2, 'robotstxt/request_count': 1, 'robotstxt/response_count': 1, 'robotstxt/response_status_count/200': 1, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2021, 1, 19, 12, 41, 20, 367422)} 2021-01-19 13:41:20 [scrapy.core.engine] INFO: Spider closed (finished)