-
This is a usability issue, although I'm not sure it's a good one. This is based on a real case we discovered with @whalebot-helmsman .
Consider a spider which crawls a large list of URLs, and in it…
-
Currently the output of a spider log looks like this:
```python
>>> spider.logger.warning("test")
>>> 2018-03-10 13:42:56 [spider_name_goes_here] WARNING: test
```
The problem with this that th…
-
**scrapy.core.downloader.handlers.DownloadHandlers** gets the handler via scheme parsed from request.url.
def download_request(self, request, spider):
scheme = urlparse_cached(request)…
-
Too many connections error when doing `python3 execute_spider.py -d -site_id xxx`
Could get rid of error if close mysql connection and restart.
The error is suspected raised from too many unclosed …
-
Using Scrapy 1.5.0
I took a look at the FAQ section and nothing was relevant about it.
Same for issues with keyword `KeyError` on github, Reddit, or GoogleGroups.
As you can see below, it seems t…
-
I have given the following in my scrapy settings.py file
RABBITMQ_CONNECTION_PARAMETERS = {'host': 'amqp://username:password@rabbitmqserver', 'port':5672}
But I am getting the following error:
…
-
可以设置爬取当天到某一天的结果吗?通过定时执行。爬取最新的内容
-
My team is working on a set of scrapy spiders which we want to deploy to a scrapyd server. Our scrapyd server is configured to use an oauth2 proxy to authenticate traffic.
On all of our requests to o…
-
After setting HTTPCACHE_ENABLED = True I find that the file storage is separated by spider name. So the web page is still reCrawled in another spider. That makes this feature a dup of duplicate url fi…
-
**Describe the bug**
启用异步的TWISTED_REACTOR时候,部署就会报错
**Traceback**
Traceback (most recent call last):
File "D:\anaconda\envs\scrapy\lib\site-packages\twisted\web\http.py", line 2369, in …