-
Hi,
This is a question, not a bug report.
[url-frontier](https://github.com/crawler-commons/url-frontier) is an API to define a [crawl frontier](https://en.wikipedia.org/wiki/Crawl_frontier). …
-
Are you planning to add maybe https://github.com/apify/crawlee? He has popetter, got scraping and many more, written in JS.
-
**Bug 描述**
当 同时启动的任务多 时,爬虫启动全部失败,报 62fddbdd4f6290e5182ab109/oreo/spiders/henan/__init__.py": dial tcp 127.0.0.1:8000: socket: too many open files错误。
**复现步骤**
该 Bug 复现步骤如下
1. 同时启动很多爬虫,例如同时启动30个sc…
-
**Describe the bug**
I have some worker nodes and schedule for run task on this worker nodes.
Due to some reason, some worker nodes is removed and create new worker nodes.
But when schedule run, th…
-
-
想请教下作者,开发这个crawlab爬虫管理平台需要用到哪些技术栈或知识储备呢。来自一名学生
-
**Bug 描述**
当任务日志超过100页或者更多页时 自动跳转第一页,无法实时显示最新的日志
**复现步骤**
该 Bug 复现步骤如下
1.运行爬虫打印日志使其超过100页或更多页
2. 它会短暂的跳转最新页(但是输出的内容是第一页),然后强行跳转第一页
3. 周而复始
**期望结果**
希望他能正常的显示日志
-
**Describe the bug**
The worker and crawlab master can keep restarting after the upgrade.
**To Reproduce**
Steps to reproduce the behavior:
1. Using 0.5.1 crawlab setup, put the dockers down:
`…
-
Crawlab can't download media files and large files (e.g. jpg files, mp3 files, gif files, zip files and so on.)
A viable approach is to add one type of node. Let's say, MediaWorker. The links of th…
-
**Bug 描述**
例如,当采集器任务完成时数据库连接没有回收,数据库连接一直属于Sleep 状态,入库通过pyhon 的crawlab库的save_item 单采集器任务连接占用上千,通过save_items 批量写入也会占用几百的数据库连接,多采集器任务采集时,数据库连接很快就会被crawlab使用完,导致影响正常业务❌❌❌。
严重影响正常业务,希望马上解决,mongo和mysql都…