-
`process_item` is called based on the order of the pipeline classes mentioned in [`ITEM_PIPELINES`](https://doc.scrapy.org/en/latest/topics/settings.html#std:setting-ITEM_PIPELINES) setting. But, `clo…
-
Hi,
If there is any exception with response parsing in scrapy, the request remain marked as `QUEUED` and no error is logged on the frontier. …
-
I'm experiencing difficulties in accessing a ScrapyRT service running on specific ports within a Kubernetes pod. My setup includes a Kubernetes cluster with a pod running a Scrapy application, which u…
-
I have several failover IPs that are well configured (they work with wget or curl), and I would like to bind them when I use Scrapy, so I use the bindaddress key to achieve this, but the public IP is …
-
I met a interesting failure when I did a unittest about the method `process_spider_exception` of the `spider middleware`:
In my project, this method returns a iterable (list) of request objects, wh…
-
This is a great dataset by the way, and we wanted to use it for a group project we were doing. Unfortunately, we are running into some errors while following the steps. When I started the ./run.sh, th…
-
[root@localhost woaidu_crawler]# scrapy crawl woaidu
Unhandled error in Deferred:
Unhandled Error
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/scrapy/commands/crawl.py…
-
Good day. Let's say we have a million requests inside a slot, then consumer defines either `HCF_CONSUMER_MAX_REQUESTS = 15000` or `HCF_CONSUMER_MAX_BATCHES = 150` or it just closes itself at after N h…
-
### Description
I have been trying to use Scrapy's CrawlSpider to crawl listings from a website. The problem is the data comes from `XMLHttpRequest`. So, I have been using `[Puppeteer As A Servivce…
-
Hi, there,
I am working on Frontera these days, and Frontera is a great tool for cluster crawling!
But I still find there is something not that easy to understand/figure out, because of the lack…