David-Carrasco / Scrapy-Idealista

Scrapping data from Real Estate site www.idealista.com
GNU General Public License v2.0
158 stars 62 forks source link

Is it still updated ? #13

Open Telsho opened 1 year ago

Telsho commented 1 year ago

Hi guys,

I wanted to use your project to scrap some data. I tried with "https://www.idealista.it/affitto-case/torino-torino/" and I had all my proxies dying without any results :

2023-02-25 15:53:03 [rotating_proxies.middlewares] DEBUG: Retrying <GET https://www.idealista.it/affitto-case/torino-torino/> with another proxy (failed 34 times, max retries: 99999999999) 2023-02-25 15:53:04 [rotating_proxies.expire] DEBUG: Proxy <http://174.70.1.210:8080> is GOOD 2023-02-25 15:53:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.idealista.it/affitto-case/torino-torino/> (referer: None) 2023-02-25 15:53:04 [scrapy.core.engine] INFO: Closing spider (finished) 2023-02-25 15:53:04 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'bans/error/scrapy.core.downloader.handlers.http11.TunnelError': 2, 'bans/error/twisted.internet.error.ConnectionRefusedError': 2, 'bans/error/twisted.internet.error.TimeoutError': 11, 'bans/status/403': 19, 'downloader/exception_count': 15, 'downloader/exception_type_count/scrapy.core.downloader.handlers.http11.TunnelError': 2, 'downloader/exception_type_count/twisted.internet.error.ConnectionRefusedError': 2, 'downloader/exception_type_count/twisted.internet.error.TimeoutError': 11, 'downloader/request_bytes': 18120, 'downloader/request_count': 35, 'downloader/request_method_count/GET': 35, 'downloader/response_bytes': 34766, 'downloader/response_count': 20, 'downloader/response_status_count/200': 1, 'downloader/response_status_count/403': 19, 'elapsed_time_seconds': 147.721921, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2023, 2, 25, 14, 53, 4, 461769), 'log_count/DEBUG': 77, 'log_count/INFO': 17, 'log_count/WARNING': 1, 'memusage/max': 87228416, 'memusage/startup': 79806464, 'proxies/dead': 29, 'proxies/good': 1, 'proxies/mean_backoff': 208.81744850785537, 'proxies/reanimated': 1, 'proxies/unchecked': 1, 'response_received_count': 1, 'scheduler/dequeued': 35, 'scheduler/dequeued/memory': 35, 'scheduler/enqueued': 35, 'scheduler/enqueued/memory': 35, 'start_time': datetime.datetime(2023, 2, 25, 14, 50, 36, 739848)}

Let me know if I can help you in some ways or I'm just missing something. Thanks !