digital-engineering / airbnb-scraper

Airbnb Scraper: Advanced Airbnb Search using Scrapy
GNU General Public License v3.0
190 stars 66 forks source link

Cannot get it to run #11

Closed wgicio closed 3 years ago

wgicio commented 3 years ago

(env) macca@macca:~/Documents/airbnb-scraper-master$ scrapy crawl airbnb -a query="Colorado Springs, CO" -o colorado_springs.csv 2021-03-29 15:41:56 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: deepbnb) 2021-03-29 15:41:56 [scrapy.utils.log] INFO: Versions: lxml 4.6.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.8.5 (default, Jan 27 2021, 15:41:15) - [GCC 9.3.0], pyOpenSSL 20.0.1 (OpenSSL 1.1.1i 8 Dec 2020), cryptography 3.3.1, Platform Linux-5.8.0-45-generic-x86_64-with-glibc2.29 2021-03-29 15:41:56 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor 2021-03-29 15:41:56 [scrapy.crawler] INFO: Overridden settings: {'AUTOTHROTTLE_ENABLED': True, 'BOT_NAME': 'deepbnb', 'CONCURRENT_REQUESTS_PER_DOMAIN': 10, 'COOKIES_ENABLED': False, 'DOWNLOAD_DELAY': 10, 'FEED_EXPORT_FIELDS': ['name', 'url', 'price_rate', 'price_rate_type', 'total_price', 'room_and_property_type', 'min_nights', 'max_nights', 'latitude', 'longitude', 'monthly_price_factor', 'weekly_price_factor', 'room_type', 'person_capacity', 'amenities', 'review_count', 'review_score', 'rating_accuracy', 'rating_checkin', 'rating_cleanliness', 'rating_communication', 'rating_location', 'rating_value', 'star_rating', 'satisfaction_guest', 'description', 'neighborhood_overview', 'notes', 'additional_house_rules', 'interaction', 'access', 'transit', 'response_rate', 'response_time', 'photos'], 'NEWSPIDER_MODULE': 'deepbnb.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['deepbnb.spiders'], 'TELNETCONSOLE_ENABLED': False, 'USER_AGENT': 'deepbnb (+https://digitalengineering.io)'} 2021-03-29 15:41:56 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats', 'scrapy.extensions.throttle.AutoThrottle'] 2021-03-29 15:41:56 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2021-03-29 15:41:56 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2021-03-29 15:41:56 [scrapy.middleware] INFO: Enabled item pipelines: ['deepbnb.pipelines.DuplicatesPipeline', 'deepbnb.pipelines.BnbPipeline'] 2021-03-29 15:41:56 [scrapy.core.engine] INFO: Spider opened 2021-03-29 15:41:56 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2021-03-29 15:41:56 [airbnb] INFO: starting survey for: Colorado Springs, CO 2021-03-29 15:41:56 [scrapy.core.engine] ERROR: Error while obtaining start requests Traceback (most recent call last): File "/home/macca/Documents/airbnb-scraper-master/env/lib/python3.8/site-packages/scrapy/core/engine.py", line 129, in _next_request request = next(slot.start_requests) File "/home/macca/Documents/airbnb-scraper-master/deepbnb/spiders/airbnb.py", line 131, in start_requests yield self.__explore_search.api_request(self.__query, params, self.__explore_search.parse_landing_page) File "/home/macca/Documents/airbnb-scraper-master/deepbnb/api/ExploreSearch.py", line 67, in api_request headers = self._get_search_headers() File "/home/macca/Documents/airbnb-scraper-master/deepbnb/api/ApiBase.py", line 51, in _get_search_headers return required_headers | { TypeError: unsupported operand type(s) for |: 'dict' and 'dict' 2021-03-29 15:41:56 [scrapy.core.engine] INFO: Closing spider (finished) 2021-03-29 15:41:56 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'elapsed_time_seconds': 0.003824, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2021, 3, 29, 7, 41, 56, 608685), 'log_count/ERROR': 1, 'log_count/INFO': 9, 'memusage/max': 68800512, 'memusage/startup': 68800512, 'start_time': datetime.datetime(2021, 3, 29, 7, 41, 56, 604861)} 2021-03-29 15:41:56 [scrapy.core.engine] INFO: Spider closed (finished)

digitalengineering commented 3 years ago

What version of Python are you using? It needs version 3.9 or newer.