Open tosi3k opened 4 months ago
CC @crenshaw-dev
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
@tosi3k just reading around about the watch cache, it looks like maybe the watch cache doesn't support pagination? https://github.com/kubernetes/kubernetes/issues/102672
If that's still the case, are we going to end up falling back to etcd anyway since we set the page size? And do we run the risk of causing massing kube-apiserver memory load by using the watch cache instead of etcd?
Watch cache not having support for pagination (i.e. disregarding the limit
from the List API call's URI query string) is a known issue - the issue you linked provides a nice discussion about the tradeoffs we face here.
If that's still the case, are we going to end up falling back to etcd anyway since we set the page size?
Not really - limit
parameter is what is disregarded when set in conjunction with resourceVersion=0
in a List API call, not the resourceVersion
param.
And do we run the risk of causing massing kube-apiserver memory load by using the watch cache instead of etcd?
I'd rather call it trading off some amount of memory for large amount of CPU with the latter being a scarcer resource at scale of large Kubernetes clusters in general :). In conjunction with https://github.com/argoproj/gitops-engine/pull/616, that would be a short memory usage spike in kube-apiserver
just on cache initialization rather than every 10 minutes.
In general, this will be eventually addressed by https://github.com/kubernetes/enhancements/pull/3142 - this enhancement's enablement was unfortunately reverted last-minute in K8s 1.31 but will be available in 1.32.
Not really - limit parameter is what is disregarded when set in conjunction with resourceVersion=0 in a List API call, not the resourceVersion param.
Should we explicitly remove the limit
parameter to avoid confusion, or does it end up being used in the case of a fallback to etcd?
In general, this will be eventually addressed by https://github.com/kubernetes/enhancements/pull/3142 - this enhancement's enablement was unfortunately reverted last-minute in K8s 1.31 but will be available in 1.32.
Makes sense, I'll just keep an eye on CPU use while testing, and we'll want to pay attention during Argo CD RCs for any complaints.
Should we explicitly remove the
limit
parameter to avoid confusion, or does it end up being used in the case of a fallback to etcd?
Good question - we don't want to drop it since there's a possibility someone disabled the watch cache explicitly (I think there are some K8s distros that do so) by setting the apiserver's --watch-cache
flag to false: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/.
Cool! Okay next step is for me to do some testing and gather data. I can't make any promises on how soon that'll be, but I want to get this change into 2.13.
Problem statement in https://github.com/argoproj/argo-cd/issues/18838.
This PR addresses the problem partially - starting with delegating the K8s List calls to the watch cache, offloading the etcd.