Closed FedeBev closed 1 year ago
Sorry for the mega-delay here @FedeBev, thank you very much for the work! I had not prioritized the K8s work until now, going to test this and merge if it works.
What kind of problem are you facing @victorcoder? I just tried a rolling update from version v3.1.8 to v3.1.10 and everything went good except for a slight leader election delay.
Do you mind sharing your values and some details about your cluster?
I've just fixed the missing newlines and cleaned up the sample values file
I just do a helm install
helm upgrade
and the rolling is too fast for the leader election to happen correctly, when done rolling there's no leader in the cluster, all nodes ends up as followers.
Are you tweeking some other options?
No additional tweek on my configuration.
This also happens to me, but after a few seconds after the end of the rollout a new leader is elected again and everything goes fine.
We can try to tune the rollout strategy, but taking into account the issue about the kubernetes discovery, I don't think we can achieve something better with helm. An operator would solve the issue, but it's a huge effort.
@FedeBev what about a liveness, readiness check?
@vcastellm in the issue about the kubernetes discovery I've mentioned before, you can find why I wasn't able to use those.
I don't see any other possible way right now. I guess we should change something about how dkron starts or request a PR to the kubernetes discovery library. In my opinion, the point is that dkron should be able to discover itself when the pod is running (/health returns 200), NOT when is ready. This way the k8s discovery works and we can implement a useful /health and ready probe.
Understood, thanks, I will take a look on how we can improve the /health endpoint
Keep me posted, I'd be glad to help. Dkron could become a core component for a project I'm working on however our platform relies entirely on K8s and this is a big issue
I'm also very interested in this fix. @vcastellm @FedeBev anything that I can do to make this happen? Also, is this not better than the currently available chart?
@Espina2 the chart is working on my cluster in a dev environment, but it's still very young and there's still a long way to go before being production ready. There's an issue with the k8s discovery that makes the dkron cluster unavailable due to master election during the rolling upgrade.
hello! also highly interested in this. Has anyone come up with a way to work around the discovery issue? any thoughts on how to proceed?
Merged this PR and added some changes, I'll take over from here.
Since there's still an open issue in the main repo and I've the feeling this chart has been dropped, I've decided to rework it completely.
The main differences are:
Known issues:
Let me know what are your thoughts, I'm open to feedback available to make any changes