-
### Description
If a middleware raises an exception, running `scrapy crawl` or `scrapy check` raises the exception to the shell but returns with exit code 0, instead of the expected 1.
### Steps…
-
使用Github抓取博客链接、使用mongodb存储数据,在抓取阶段出现问题
`https://blog.akimio.top/links/`是用的是`butterfly`魔改主题(solitude)[https://github.com/everfu/hexo-theme-solitude],之前是可以正常抓取的,**一开始我怀疑是主题的问题,找了一个原版butterfly主题的友链,还是出现…
-
# Description
If i insert start url to redis before run scrapy, is successful.
But if i run scrapy first and insert url, listen url will get fail info:
```
2023-08-13 17:11:59 [scrapy.utils.…
-
Extension of https://github.com/scrapy/scrapy/issues/1015 - spider exceptions don't trigger `process_spider_exception` if they're called from an `errback` method.
```
import logging
from scra…
-
I think what you are doing in the project is that for every request for product review you are creating a scrapy job. This is compute heavy job and will not handle more than 10 requests at a time on …
-
The scrapy integration with the newscatcher fetcher maxes out around 50 URLs/minute. This is insufficient for our needs, but changing the throttling variables and such doesn't seem to increase it. I t…
-
Решил разобраться с scrapy но почему то не смог даже запустить ваш проект.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Progr…
-
### Description
The `OffsiteMiddleware` logs a single message for each domain filtered. Great!
But then the `core.engine` logs a message for every single url filtered by the OffsiteMiddleware.
(L…
-
```
Traceback (most recent call last):
File "xsscrapy.py", line 45, in
main()
File "xsscrapy.py", line 41, in main
'-s', 'DOWNLOAD_DELAY=%s' % rate])
File "/Library/Frameworks/Pyt…
-
Запустил пример из README, но данных не получил... Может изменилась разметка сайта?
```
scrapy crawl wb -o artifacts/wb.json -a category_url="https://www.wildberries.ru/catalog/zhenshchinam/odez…