argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.86k stars 5.45k forks source link

Support Leader Election for argocd-application-controller #3073

Open d-kuro opened 4 years ago

d-kuro commented 4 years ago

Summary

argocd-application-controller supports leader election.

Motivation

Running multiple Kubernetes Controllers causes a conflict. Supporting leader election allows multiple controllers to work at the same time, which indicates that the controller will be highly available. You can also use a rolling update for the controller's deployment strategy.

Actually controller-manager, scheduler, cluster-autoscaler, etc. support leader election.

https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/ https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/ https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca

Proposal

Add the following to the options of arogcd-applicaton-controller.

name description default
leader-elect Start a leader election client and gain leadership before executing the main loop.Enable this when running replicated components for high availability true
leader-elect-lease-duration The duration that non-leader candidates will wait after observing a leadershiprenewal until attempting to acquire leadership of a led but unrenewed leader slot.This is effectively the maximum duration that a leader can be stopped before it is replaced by another candidate.This is only applicable if leader election is enabled 15 seconds
leader-elect-renew-deadline The interval between attempts by the acting master to renew a leadership slot before it stops leading.This must be less than or equal to the lease duration.This is only applicable if leader election is enabled 10 seconds
leader-elect-retry-period The duration the clients should wait between attempting acquisition and renewal of a leadership.This is only applicable if leader election is enabled 2 seconds
leader-elect-resource-lock The type of resource object that is used for locking during leader election.Supported options are leases (default), endpoints and configmaps "leases"

All seconds are default values. https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/leaderelection/leaderelection.go#L111

Also, change the number of replicas for the argocd-application-controller in the install manifest of HA configuration to 2. and change the Deployment strategy to a rolling update.

See below for a code sample of leader election. https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/client-go/examples/leader-election

I can create a Pull Request for this enhancement issue.

jessesuen commented 4 years ago

Some considerations about this request:

  1. The coordination.k8s.io/LeaseLock resource kind has only been around only since Kubernetes v1.13, which means K8s v1.13 will become a minimum requirement for Argo CD.

  2. Running multiple controllers will cause the application controller's prometheus metric endpoint to become inconsistent depending on which instance was hit during scraping. If we allow multiple controllers to run, then the metrics will likely need to be sent and cached to redis instead of in-memory like they are currently.

Supporting leader election allows multiple controllers to work at the same time

  1. Leader election implies an active-passive relationship between the leader vs. followers, where passive controller instances are not actively in use. So we would not receive any scaling benefits from leader election alone. To truly benefit, we would need to additionally implement sharding of the application controller, so that we have active-active controllers which are operating on a different subset of applications.
raelga commented 4 years ago

+1

somaliz commented 4 years ago

+1

jannfis commented 2 years ago

which means K8s v1.13 will become a minimum requirement for Argo CD

I think we should be fine with that by now :)

jullianow commented 2 years ago

+1

Interaze commented 1 year ago

Hi. I believe this would be nice. There's a significant issue currently in maintaining a HA application set controller

Interaze commented 1 year ago

I see that the 3.0.0 advertises LEADER_ELECTION_IDENTITY environment variable. I'll look into this and see if this is a solution. If this is the case, perhaps we can close this

rumstead commented 1 year ago

https://argo-cd.readthedocs.io/en/stable/operator-manual/applicationset/Getting-Started/#enabling-high-availability-mode

rumstead commented 1 day ago

@d-kuro the applicationset controller does support leader election. Are we able to close this issue?