Open andreygolev opened 4 years ago
Seems that it's related to the issue which Consul had some time before. This PR should fixed the issue for Consul: https://github.com/hashicorp/consul/pull/2319/files#diff-316ef07c64533d6ab7ec6f83189a29e3
But looks like it's not that straightforward to merge into Dkron :)
I've created a workaround this by creating a bash script that checks if there is a cluster created already. If cluster is already there, then run agent without --bootstrap-expect option and it launches just fine.
Describe the bug Hi, I'm trying to make Dkron work in Kubernetes environments, but so far not so much luck there. Currently I'm trying to run Dkron server part with "Deployment" kind in Kubernetes which means that hostname and IP address of container will be random for each container in this deployment. Because of these specifics, I'm running Dkron, 3 instances, with auto-join with the following config:
Also, no data persistence is enabled in containers. So, each new container joins with empty data dir and new hostname and IP. First, initial start of 3 instances and bootstrapping works just fine.
Bad things start to happen when I delete or add more Dkron servers to cluster.
This is a log of a new container that's is joining to an existing cluster: dkron-server-794cd48678-4drnv
dkron-server-794cd48678-lbbmg
dkron-server-794cd48678-mgww8 - this is a Leader
As we can see, there are errors on leader and message that cluster leadership is lot and then immediately acquired back. Also I noticed, that when new node joins to cluster, it's not visible in "raft list-peers". Though it becomes visible after this failover. Simple "worker" agents are joining to cluster without any issues.
Expected behavior New server containers are joining to Dkron cluster without triggering failover and re-electing leader. In Kubernetes when a containers fails by any reason, it's restarted automatically by Kubernetes.
Specifications: