distribworks / dkron

Dkron - Distributed, fault tolerant job scheduling system https://dkron.io
GNU Lesser General Public License v3.0
4.33k stars 385 forks source link

How to add new node as agent? (Make a cluster) #1299

Open krismandev opened 1 year ago

krismandev commented 1 year ago

Describe the bug I am new in dkron. I still don't know the correct steps to create a cluster where there is one node as a master/server and another node as an agent.

To Reproduce these are the steps i have done so far: i have two machine. Sv1 and Sv2. what is expected is Sv1 as master/server and Sv2 as agent local IP Sv1 : 10.15.xx.xx local IP Sv2 : 10.15.xxx.xxx

  1. Install dkron in Sv1
    • APT Repository deb [trusted=yes] https://repo.distrib.works/apt/ /
    • Install the package sudo apt-get install dkron
    • Change the config in /etc/dkron/dkron.yaml
      
      # Dkron example configuration file

This node is running in server mode

server: true

Provides the number of expected servers in the datacenter.

Either this value should not be provided or the value must agree with other servers in the cluster.

When provided, Dkron waits until the specified number of servers are available and then bootstraps the cluster.

This allows an initial leader to be elected automatically. This flag requires server mode.

bootstrap-expect: 2

bind-addr: "{{ GetPrivateIP }}:8946"

log-level: debug

tags:

dc: east

encrypt: a-valid-key-generated-with-dkron-keygen

retry-join:

  1. Install dkron in Sv2
    • APT Repository deb [trusted=yes] https://repo.distrib.works/apt/ /
    • Install the package sudo apt-get install dkron
    • Change the config in /etc/dkron/dkron.yaml
      
      # Dkron example configuration file

This node is running in server mode

server: false

Provides the number of expected servers in the datacenter.

Either this value should not be provided or the value must agree with other servers in the cluster.

When provided, Dkron waits until the specified number of servers are available and then bootstraps the cluster.

This allows an initial leader to be elected automatically. This flag requires server mode.

bootstrap-expect: 2

bind-addr: "{{ GetPrivateIP }}:8946"

log-level: debug

tags:

dc: east

encrypt: a-valid-key-generated-with-dkron-keygen

retry-join:

Problem when i open the dashboard http://127.0.0.1:8080/ui/#/ it shown only one node (master/sever node itself)

here is the log when i check from systemctl status dkron Sv1

Mar 19 09:46:23 sv1 dkron[347515]: time="2023-03-19T09:46:23+07:00" level=info msg="api: Running HTTP server" address=":8080" node=sv1
Mar 19 09:46:23 sv1 dkron[347515]: time="2023-03-19T09:46:23+07:00" level=info msg="dkron: monitoring leadership" node=sv1
Mar 19 09:46:23 sv1 dkron[347515]: time="2023-03-19T09:46:23+07:00" level=info msg="agent: registering usage stats for cluster ID 'abcdefg
Mar 19 09:46:23 sv1 dkron[347515]: time="2023-03-19T09:46:23+07:00" level=info msg="agent: Listen for events" node=sv1
Mar 19 09:46:23 sv1 dkron[347515]: time="2023-03-19T09:46:23+07:00" level=info msg="agent: Received event" event=member-join node=sv1
Mar 19 09:46:23 sv1 dkron[347515]: time="2023-03-19T09:46:23+07:00" level=info msg="adding server" node=sv1 server=sv1
Mar 19 09:46:24 sv1 dkron[347515]: time="2023-03-19T09:46:24+07:00" level=info msg="agent: Received event" event=member-update node=sv1
Mar 19 09:46:25 sv1 dkron[347515]: time="2023-03-19T09:46:25+07:00" level=info msg="dkron: cluster leadership acquired" node=sv1
Mar 19 09:46:25 sv1 dkron[347515]: time="2023-03-19T09:46:25+07:00" level=info msg="dkron: monitoring leadership" node=sv1
Mar 19 09:46:25 sv1 dkron[347515]: time="2023-03-19T09:46:25+07:00" level=info msg="agent: Starting scheduler" node=sv1\

Sv2

Mar 19 09:46:27 sv2 dkron[7386]: time="2023-03-19T09:46:27+07:00" level=warning msg="plugin configured with a nil SecureConfig" node=sv2
Mar 19 09:46:28 sv2 dkron[7386]: time="2023-03-19T09:46:28+07:00" level=warning msg="plugin configured with a nil SecureConfig" node=sv2
Mar 19 09:46:29 sv2 dkron[7386]: time="2023-03-19T09:46:29+07:00" level=info msg="agent: Dkron agent starting" node=sv2
Mar 19 09:46:29 sv2 dkron[7386]: time="2023-03-19T09:46:29+07:00" level=info msg="agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce k8s linode mdn>
Mar 19 09:46:29 sv2 dkron[7386]: time="2023-03-19T09:46:29+07:00" level=info msg="agent: Joining cluster..." cluster=LAN node=sv2
Mar 19 09:46:39 sv2 dkron[7386]: time="2023-03-19T09:46:39+07:00" level=info msg="agent: Join LAN completed. Synced with 1 initial agents" node=sv2
Mar 19 09:46:39 sv2 dkron[7386]: time="2023-03-19T09:46:39+07:00" level=info msg="agent: Listen for events" node=sv2
Mar 19 09:46:39 sv2 dkron[7386]: time="2023-03-19T09:46:39+07:00" level=info msg="agent: Received event" event=member-join node=sv2
Mar 19 09:46:39 sv2 dkron[7386]: time="2023-03-19T09:46:39+07:00" level=warning msg="non-server in gossip pool" member=sv2 node=sv2
Mar 19 09:46:40 sv2 dkron[7386]: time="2023-03-19T09:46:40+07:00" level=info msg="agent: Received event" event=member-update node=sv2

please help me what is the correct way to create a dkron cluster. any help is appreciated

vcastellm commented 1 year ago

Be sure that the retry-join ips are correct and you should be good to go. https://dkron.io/docs/usage/clustering