scrapy-crawler Search Results

scrapy/scrapy #4292

Exceptions in middleware don't return exit code 1 in `scrapy…

### Description If a middleware raises an exception, running `scrapy crawl` or `scrapy check` raises the exception to the shell but returns with exit code 0, instead of the expected 1. ### Steps…

dpfeif updated 1 week ago

Rock-Candy-Tea/hexo-circle-of-friends #153

友链抓取失败，run.py运行出错

使用Github抓取博客链接、使用mongodb存储数据，在抓取阶段出现问题 `https://blog.akimio.top/links/`是用的是`butterfly`魔改主题(solitude)[https://github.com/everfu/hexo-theme-solitude]，之前是可以正常抓取的，**一开始我怀疑是主题的问题，找了一个原版butterfly主题的友链，还是出现…

Akimio521 updated 11 hours ago

rmax/scrapy-redis #285

[Question] Fetch request url from redis fail

# Description If i insert start url to redis before run scrapy, is successful. But if i run scrapy first and insert url, listen url will get fail info: ``` 2023-08-13 17:11:59 [scrapy.utils.…

KokoTa updated 3 weeks ago

scrapy/scrapy #6437

process_spider_exception not executed for exceptions in errb…

Extension of https://github.com/scrapy/scrapy/issues/1015 - spider exceptions don't trigger `process_spider_exception` if they're called from an `errback` method. ``` import logging from scra…

mohmad-null updated 1 week ago

SaptakS/opinator #8

Remove scrapy as your crawler.

I think what you are doing in the project is that for every request for product review you are creating a scrapy job. This is compute heavy job and will not handle more than 10 requests at a time on …

fluffybeing updated 9 years ago

dataculturegroup/feminicide-story-processor #30

newscatcher fetching with scrapy is too slow

The scrapy integration with the newscatcher fetcher maxes out around 50 URLs/minute. This is insufficient for our needs, but changing the throttling variables and such doesn't seem to increase it. I t…

rahulbot updated 2 weeks ago

ivan-ver/parsing_sro #1

Error scrapy

Решил разобраться с scrapy но почему то не смог даже запустить ваш проект. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Progr…

leprick0n updated 3 years ago

scrapy/scrapy #6433

core.engine/Signal handler polluting log

### Description The `OffsiteMiddleware` logs a single message for each domain filtered. Great! But then the `core.engine` logs a message for every single url filtered by the OffsiteMiddleware. (L…

djuntsu updated 2 weeks ago

DanMcInerney/xsscrapy #45

How to fix this issue

``` Traceback (most recent call last): File "xsscrapy.py", line 45, in main() File "xsscrapy.py", line 41, in main '-s', 'DOWNLOAD_DELAY=%s' % rate]) File "/Library/Frameworks/Pyt…

Arshland35 updated 5 years ago

wondersell/wildsearch-crawler #3

Error in example

Запустил пример из README, но данных не получил... Может изменилась разметка сайта? ``` scrapy crawl wb -o artifacts/wb.json -a category_url="https://www.wildberries.ru/catalog/zhenshchinam/odez…

berlinhemi updated 2 years ago

1000+ results for scrapy-crawler

1000+ results
for scrapy-crawler