crawlab-team / crawlab-sdk

SDK for Crawlab, including SDK for different programming languages such as Python, Node.js and Java, and a CLI Tool written in Python.
https://crawlab.cn
BSD 3-Clause "New" or "Revised" License
55 stars 50 forks source link

About crawlab.json #26

Open ma-pony opened 2 years ago

ma-pony commented 2 years ago

When I was working with the SDK, I found that the SDK was not very convenient for schedules and deployment of multiple spiders, so I wondered if it could be designed to look like the following

.
| ── packages
│         | ── js_spiders
│         |         | ── js_spider_1
│         |         |         | ── index.js
│         |         | ── js_spider_2
│         |         |         | ── index.js
│         |         | ── package.json
│         |         | ── .....
│         | ──  py_spiders
│         |         | ── py_spider_1
│         |         |         | ── main.py
│         |         | ── py_spider_2
│         |         |         | ── main.py
│         |         | ── setup.py
│         |         | ── .....
│ ── crawlab.json
│ ── makefile

crawlab.json

{
  "spiders": [
    {
      "path": "packages/js_spider",
      "exclude_path": "node_modules",
      "name": "js spiders",
      "description": "js spiders",
      "cmd": "node",
      "schedules": [
        {
          "name": "js spider 1 cron",
          "cron": "* 1 * * *",
          "command": "node js_spider_1/index.js",
          "param": "",
          "mode": "random",
          "description": "js spider 1 cron",
          "enabled": true
        },
        {
          "name": "js spider 2 cron",
          "cron": "* 2 * * *",
          "command": "node js_spider_2/index.js",
          "param": "",
          "mode": "random",
          "description": "js spider 2 cron",
          "enabled": true
        }
      ]
    },
    {
      "path": "packages/py_spider",
      "exclude_path": ".venv",
      "name": "py spiders",
      "description": "py spiders",
      "cmd": "python",
      "schedules": [
        {
          "name": "py spider 1 cron",
          "cron": "* 1 * * *",
          "command": "python py_spider_1/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 1 cron",
          "enabled": true
        },
        {
          "name": "py spider 2 cron",
          "cron": "* 2 * * *",
          "command": "python py_spider_2/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 2 cron",
          "enabled": true
        }
      ]
    }
  ]
}

I can help implement this if you think it is possible @tikazyq

tikazyq commented 2 years ago

Multi-spider support is on the way. Please follow this issue https://github.com/crawlab-team/crawlab/issues/1190

ma-pony commented 2 years ago

Multi-spider support is on the way. Please follow this issue crawlab-team/crawlab#1190

Will schedules deployments also be included?

tikazyq commented 2 years ago

Would you elaborate a bit?

ma-pony commented 2 years ago

Would you elaborate a bit?

In practice, I need to create dozens of new cronjobs along with a new crawler spider, crawler spider upload can be done from the command line, so can cronjobs be done too? then I can write these commands to CICD.

So I would like to add a new param schedules to the crawlab.json to publish and manage cronjobs, like this

    {
      "path": "packages/py_spider",
      "exclude_path": ".venv",
      "name": "py spiders",
      "description": "py spiders",
      "cmd": "python",
      "schedules": [
        {
          "name": "py spider 1 cron",
          "cron": "* 1 * * *",
          "command": "python py_spider_1/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 1 cron",
          "enabled": true
        },
        {
          "name": "py spider 2 cron",
          "cron": "* 2 * * *",
          "command": "python py_spider_2/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 2 cron",
          "enabled": true
        },
       ...
      ]
    }

what do you think of these ideas, or do you have any other better suggestions?

ma-pony commented 2 years ago

Would you elaborate a bit?

@tikazyq What do you think about the above

tikazyq commented 2 years ago

I think that's a good idea but it might take some time to implement it. Let's create a new enhancement issue in the main repo https://github.com/crawlab-team/crawlab