distribworks / dkron

Dkron - Distributed, fault tolerant job scheduling system https://dkron.io
GNU Lesser General Public License v3.0
4.31k stars 380 forks source link

Executor not working #380

Closed jjsaunier closed 6 years ago

jjsaunier commented 6 years ago

Hey,

I migrate from command (since declared as deprecated) to shell executor and I got the following issue from the agent :

INFO[2018-05-23T19:08:20Z] agent: Starting job                           job=stats_update node=04fe58fe3cf0
ERRO[2018-05-23T19:08:20Z] invoke: Specified executor is not present     executor="<nil>" node=04fe58fe3cf0

with following config :

{
    "name": "stats_update",
    "schedule": "@every 15s",
    "executor": "shell",
    "executor_config": {
        "command": "./stats_update",
        "shell": false
    },
    "concurrency": "forbid",
    "tags": {
        "role": "manager:1"
    }
}

I'm on 0.10.1 dkron version.

My config is bad or it's bug ?

vcastellm commented 6 years ago

Is the plug-in present in the filesystem? What package did you use to install?

jjsaunier commented 6 years ago

I'm using the latest docker image: dkron/dkron:v0.10.1

vcastellm commented 6 years ago

Could you check the startup log? you should have something like plugin.dkron-executor-shell: plugin address: timestamp=2018-05-23T20:06:18.349Z address=/tmp/plugin491672744 network=unix

jjsaunier commented 6 years ago

From agent :

INFO[2018-05-23T20:29:32Z] agent: Dkron agent starting                   node=ab89da8c0439
INFO[2018-05-23T20:29:32Z] agent: joining: [dkron:8946] replay: true     node=ab89da8c0439
INFO[2018-05-23T20:29:32Z] agent: joined: 1 nodes                        node=ab89da8c0439
INFO[2018-05-23T20:29:32Z] agent: Listen for events                      node=ab89da8c0439
INFO[2018-05-23T20:29:32Z] agent: Starting job                           job=stats_update node=ab89da8c0439
INFO[2018-05-23T20:29:32Z] agent: Starting job                           job=vacuum_worker_timedout node=ab89da8c0439
ERRO[2018-05-23T20:29:32Z] invoke: Specified executor is not present     executor="<nil>" node=ab89da8c0439

From server :

2018-05-23T19:06:25.397Z [DEBUG] plugin: starting plugin: path=/opt/local/dkron/dkron-processor-files args=[/opt/local/dkron/dkron-processor-files]
2018-05-23T19:06:25.397Z [DEBUG] plugin: waiting for RPC address: path=/opt/local/dkron/dkron-processor-files
2018-05-23T19:06:25.436Z [DEBUG] plugin.dkron-processor-files: plugin address: timestamp=2018-05-23T19:06:25.436Z address=/tmp/plugin191457768 network=unix
2018-05-23T19:06:25.438Z [DEBUG] plugin: starting plugin: path=/opt/local/dkron/dkron-processor-log args=[/opt/local/dkron/dkron-processor-log]
2018-05-23T19:06:25.439Z [DEBUG] plugin: waiting for RPC address: path=/opt/local/dkron/dkron-processor-log
2018-05-23T19:06:25.482Z [DEBUG] plugin.dkron-processor-log: plugin address: timestamp=2018-05-23T19:06:25.480Z address=/tmp/plugin997834048 network=unix
2018-05-23T19:06:25.485Z [DEBUG] plugin: starting plugin: path=/opt/local/dkron/dkron-processor-syslog args=[/opt/local/dkron/dkron-processor-syslog]
2018-05-23T19:06:25.485Z [DEBUG] plugin: waiting for RPC address: path=/opt/local/dkron/dkron-processor-syslog
2018-05-23T19:06:25.526Z [DEBUG] plugin.dkron-processor-syslog: plugin address: timestamp=2018-05-23T19:06:25.526Z address=/tmp/plugin059557485 network=unix
2018-05-23T19:06:25.529Z [DEBUG] plugin: starting plugin: path=/opt/local/dkron/dkron-executor-shell args=[/opt/local/dkron/dkron-executor-shell]
2018-05-23T19:06:25.529Z [DEBUG] plugin: waiting for RPC address: path=/opt/local/dkron/dkron-executor-shell
2018-05-23T19:06:25.567Z [DEBUG] plugin.dkron-executor-shell: plugin address: timestamp=2018-05-23T19:06:25.566Z address=/tmp/plugin716986573 network=unix
time="2018-05-23T19:06:25Z" level=info msg="agent: Dkron agent starting" node=2f5bff6590ac
time="2018-05-23T19:06:25Z" level=info msg="agent: joining: [] replay: true" node=2f5bff6590ac
[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

So yes from server side I got it, moreover, got no error from server side

jjsaunier commented 6 years ago

Here is my config

From agent :

{
    "backend": "redis",
    "backend_machine": "redisdb:6379",
    "server": false,
    "debug": true,
    "log-level": "debug",
    "tags": {
        "role": "task"
    },
    "advertise": "dkron",
    "keyspace": "dkron",
    "encrypt": "XXX",
    "join": [
        "dkron:8946"
    ]
}

From server :

{
    "backend": "redis",
    "backend_machine": "redisdb:6379",
    "server": true,
    "debug": true,
    "log-level": "debug",
    "tags": {
        "role": "manager"
    },
    "keyspace": "dkron",
    "encrypt": "XXX"
}
jjsaunier commented 6 years ago

Update : Still the same under 0.10.2, but logs are better from the agent :

INFO[2018-05-23T21:40:30Z] agent: Dkron agent starting                   node=e4081c888f25
INFO[2018-05-23T21:40:30Z] agent: joining: [dkron] replay: true          node=e4081c888f25
INFO[2018-05-23T21:40:30Z] agent: joined: 1 nodes                        node=e4081c888f25
INFO[2018-05-23T21:40:30Z] agent: Listen for events                      node=e4081c888f25
INFO[2018-05-23T21:41:22Z] agent: Starting job                           job=stats_update node=e4081c888f25
ERRO[2018-05-23T21:41:22Z] invoke: Specified executor is not present     executor=shell node=e4081c888f25
INFO[2018-05-23T21:41:22Z] agent: Starting job                           job=stats_update node=e4081c888f25

Instead of nil, got executor name

And form server side, still correctly loaded :

2018-05-23T21:40:02.434Z [DEBUG] plugin: starting plugin: path=/opt/local/dkron/dkron-processor-files args=[/opt/local/dkron/dkron-processor-files]
2018-05-23T21:40:02.437Z [DEBUG] plugin: waiting for RPC address: path=/opt/local/dkron/dkron-processor-files
2018-05-23T21:40:02.454Z [DEBUG] plugin.dkron-processor-files: plugin address: timestamp=2018-05-23T21:40:02.446Z address=/tmp/plugin738430045 network=unix
2018-05-23T21:40:02.471Z [DEBUG] plugin: starting plugin: path=/opt/local/dkron/dkron-processor-log args=[/opt/local/dkron/dkron-processor-log]
2018-05-23T21:40:02.477Z [DEBUG] plugin: waiting for RPC address: path=/opt/local/dkron/dkron-processor-log
2018-05-23T21:40:02.495Z [DEBUG] plugin.dkron-processor-log: plugin address: timestamp=2018-05-23T21:40:02.495Z address=/tmp/plugin535515532 network=unix
2018-05-23T21:40:02.498Z [DEBUG] plugin: starting plugin: path=/opt/local/dkron/dkron-processor-syslog args=[/opt/local/dkron/dkron-processor-syslog]
2018-05-23T21:40:02.499Z [DEBUG] plugin: waiting for RPC address: path=/opt/local/dkron/dkron-processor-syslog
2018-05-23T21:40:02.515Z [DEBUG] plugin.dkron-processor-syslog: plugin address: timestamp=2018-05-23T21:40:02.514Z address=/tmp/plugin202582924 network=unix
2018-05-23T21:40:02.517Z [DEBUG] plugin: starting plugin: path=/opt/local/dkron/dkron-executor-shell args=[/opt/local/dkron/dkron-executor-shell]
2018-05-23T21:40:02.518Z [DEBUG] plugin: waiting for RPC address: path=/opt/local/dkron/dkron-executor-shell
2018-05-23T21:40:02.702Z [DEBUG] plugin.dkron-executor-shell: plugin address: timestamp=2018-05-23T21:40:02.700Z address=/tmp/plugin158256682 network=unix
time="2018-05-23T21:40:02Z" level=info msg="agent: Dkron agent starting" node=9254e55224d8
time="2018-05-23T21:40:02Z" level=info msg="agent: joining: [] replay: true" node=9254e55224d8
[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)
vcastellm commented 6 years ago

I'm unable to reproduce the error, it's very weird. I'm missing something, the plugin is correctly loaded.

Could you try with a simpler job?

{
    "name": "job_name_1",
    "schedule": "@every 5s",
    "concurrency": "forbid",
    "executor": "shell",
    "executor_config": {
      "command": "echo 'hi'"
    }
}
jjsaunier commented 6 years ago

Yep, will dig tonight

jjsaunier commented 6 years ago

So after digging:

This is working, played from the server

{
    "name": "job_name_1",
    "schedule": "@every 5s",
    "concurrency": "forbid",
    "executor": "shell",
    "executor_config": {
        "command": "echo 'hi'"
    },
    "tags": {
        "role": "server:1"
    }
}

But using a dkron remote agent, it's failed :

{
    "name": "job_name_1",
    "schedule": "@every 5s",
    "concurrency": "forbid",
    "executor": "shell",
    "executor_config": {
        "command": "echo 'hi'"
    },
    "tags": {
        "role": "ares:1"
    }
}
INFO[2018-05-26T17:33:22Z] agent: Dkron agent starting                   node=c94294a271b4
INFO[2018-05-26T17:33:22Z] agent: joining: [dkron] replay: true          node=c94294a271b4
INFO[2018-05-26T17:33:22Z] agent: joined: 1 nodes                        node=c94294a271b4
INFO[2018-05-26T17:33:22Z] agent: Listen for events                      node=c94294a271b4
INFO[2018-05-26T17:33:22Z] agent: Starting job                           job=job_name_1 node=c94294a271b4
INFO[2018-05-26T17:33:22Z] agent: Starting job                           job=job_name_1 node=c94294a271b4
ERRO[2018-05-26T17:33:22Z] invoke: Specified executor is not present     executor=shell node=c94294a271b4
ERRO[2018-05-26T17:33:22Z] invoke: Specified executor is not present     executor=shell node=c94294a271b4
INFO[2018-05-26T17:33:22Z] agent: Starting job                           job=job_name_1 node=c94294a271b4
ERRO[2018-05-26T17:33:22Z] invoke: Specified executor is not present     executor=shell node=c94294a271b4
INFO[2018-05-26T17:33:22Z] agent: Starting job                           job=job_name_1 node=c94294a271b4
ERRO[2018-05-26T17:33:22Z] invoke: Specified executor is not present     executor=shell node=c94294a271b4
INFO[2018-05-26T17:33:26Z] agent: Starting job                           job=job_name_1 node=c94294a271b4
ERRO[2018-05-26T17:33:26Z] invoke: Specified executor is not present     executor=shell node=c94294a271b4
INFO[2018-05-26T17:33:32Z] agent: Starting job                           job=job_name_1 node=c94294a271b4
ERRO[2018-05-26T17:33:32Z] invoke: Specified executor is not present     executor=shell node=c94294a271b4
INFO[2018-05-26T17:33:38Z] agent: Starting job                           job=job_name_1 node=c94294a271b4
ERRO[2018-05-26T17:33:38Z] invoke: Specified executor is not present     executor=shell node=c94294a271b4
INFO[2018-05-26T17:33:42Z] agent: Starting job                           job=job_name_1 node=c94294a271b4
ERRO[2018-05-26T17:33:42Z] invoke: Specified executor is not present     executor=shell node=c94294a271b4
INFO[2018-05-26T17:33:45Z] agent: Starting job                           job=job_name_1 node=c94294a271b4
ERRO[2018-05-26T17:33:45Z] invoke: Specified executor is not present     executor=shell node=c94294a271b

So it's looks like executor are not loaded in from the agent side. I think you can reproduce the issue, with 1 server and 1 remote agent, and specify to play the job only from the remote agent.

vcastellm commented 6 years ago

@ProPheT777 I just fixed an issue with plugins loading https://github.com/victorcoder/dkron/commit/163dcd9177cef86841adf4bf837fa006ecb1a1d2 that has the same error message but seems not exactly the same.

I'll dig into the client's side.

vcastellm commented 6 years ago

@ProPheT777 I did the same test with dkron 0.10.1 docker version, 1 agent, 1 server, and it is working for me 😕

jjsaunier commented 6 years ago

Could be related to redis backend ? (was on etcd before). Moreover from the agent side, I have no log that said executor are loaded like on dkron server does

jjsaunier commented 6 years ago

any news on this ? I did 1000 test, like server - server etc and still the same result. Can't figure out the problem

vcastellm commented 6 years ago

@ProPheT777 you mean that even with server-server config doesn't work?

It's really a weird problem I'm unable to reproduce. Let's try to locate the issue, could you try with a deb version instead of docker? does it works?

jjsaunier commented 6 years ago

you mean that even with server-server config doesn't work?

Yes, I configure another server joining my master, and still not work, plus from logs on slave server, no mention about executor loading :/

I will create an example project to reproduce the error, hope that will help you

jjsaunier commented 6 years ago

I have sucessfully reproduce the error, here you can find a boilerplate to reproduce the bug (readme describe how quickly run it)

Some specs :

I reproduce it, with the exact same achitecture of docker that I use.

https://github.com/ProPheT777/dkron-boilerplate

Thank you very much !

vcastellm commented 6 years ago

@ProPheT777 I see, why are you building a new dkron image with the dkron binary for running the agent?

The dkron agent acts as the server and client. In your tests, you're copying the binary from the original image, but you're not copying plugins, they are separate binary files in the original image. This is why dkron can't find the shell plugin.

I make a simple proposal to your original app compose that make it work.

I recently updated docs to make this more clear, have you read through the docs or it's still confusing?

https://github.com/ProPheT777/dkron-boilerplate/pull/1

jjsaunier commented 6 years ago

Ha! Yes I embed dkron binary within my final apps. I'm using dkron since like 6 months and was working great with that way, didnt see that mention or understand plugins are not packaged with the new plugin approach for shell executor

jjsaunier commented 6 years ago

Your PR resolve the issue, but I can't use it in that way, the interest of agent approach is to "put" the agent inside application container, not create application container from dkron. I'm using it tout trigger some stuff on apps like vacuum etc and my backend is a cli.

Should I simply copy plugins directory beside binary into my image ?

vcastellm commented 6 years ago

Should I simply copy plugins directory beside binary into my image ?

Sure, you can do that.

jjsaunier commented 6 years ago

Ok perfect then! Thank you for your time!