Open aaronshurley opened 2 years ago
I think a logical next step here would be two try and reproduce this with kapp-controller itself on GKE.
We should also test with TAP to verify any fix works as expected.
Could this be improved by adjusting the kapp-controller's concurrency config (default is 10, what if we reduced it to 5)?
A note here that this is current configurable via a kapp-controller flag: https://github.com/vmware-tanzu/carvel-kapp-controller/blob/1ad808d8909d49f0dff35ee49d285fb5f0e4693f/cmd/main.go#L25
@cppforlife filed a support ticket with GCP and this was the response:
GKE autoscales control plane -- not a configurable thing. so question becomes can we find that threshold (and potentially avoid hitting) after which GKE decides to scale.
One potential solution to this would be documenting kapp-controlller installation on GKE and advising to set concurrency to a lower amount (e.g. 5). This way we could still keep current default of 10.
^ do we know that 5 for example, "fixes" the behaviour?
I believe this is just the accepted behaviour of GKE. We cannot change it as the main nodes are scaled and managed by google. Anything further ? @cppforlife
What steps did you take: Reported from other users: Installed kapp-controller (as a part of a larger product, TAP) on a new GKE cluster.
What happened: During the installation, the Kubernetes control plane became unavailable for several minutes. This caused package installs to enter a ReconcileFailed state. Eventually, when the API server became available, packages reconciled again to completion.
What did you expect: The installation works without any control plane unavailability.
Anything else you would like to add:
Environment:
Vote on this request
This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.
π "I would like to see this addressed as soon as possible" π "There are other more important things to focus on right now"
We are also happy to receive and review Pull Requests if you want to help working on this issue.