appuio / component-appuio-cloud

APPUiO Cloud
https://hub.syn.tools/appuio-cloud/index.html
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Too many Kyverno generate requests #76

Closed bastjan closed 1 year ago

bastjan commented 2 years ago

We seem to generate a lot of generate requests even on the neglected and smaller beta cluster:

2500/h, possibly more if the client throttling is removed.

Screenshot 2022-02-11 at 10 16 35

Can this hit us if we have bigger clusters with more quotas/namespaces?

Steps to Reproduce the Problem

See Kibana: https://logging.apps.lpg1.ocp4-poc.appuio-beta.ch/app/kibana/...&_a=(columns:!(message,kubernetes.namespace_name),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:ca625880-68a8-11ec-9452-47fb3714b362,key:kubernetes.namespace_name,negate:!f,params:(query:syn-kyverno,type:phrase),type:phrase,value:syn-kyverno),query:(match:(kubernetes.namespace_name:(query:syn-kyverno,type:phrase))))),index:ca625880-68a8-11ec-9452-47fb3714b362,interval:h,query:(language:lucene,query:'updated%20generate'),sort:!('@timestamp',desc)))

Expected Behavior

Generate requests only of something changed.

simu commented 2 years ago

May be related to https://github.com/kyverno/kyverno/issues/2498

bastjan commented 1 year ago

Note the log was removed but it is still visible in the client side rate limits:

Screenshot 2022-12-01 at 14 53 33

The big drop in log volume is when we set higher throttling limits.

https://logging.apps.cloudscale-lpg-2.appuio.cloud/app/kibana#/discover?_g=(refreshInterval:(pause:!t,value:0),time:(from:'2022-11-28T13:41:14.436Z',mode:absolute,to:'2022-12-01T14:00:00.000Z'))&_a=(columns:!(kubernetes.namespace_name,message),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'77166340-d05b-11ec-ae43-c5a0233ff698',key:kubernetes.namespace_name,negate:!f,params:(query:syn-kyverno,type:phrase),type:phrase,value:syn-kyverno),query:(match:(kubernetes.namespace_name:(query:syn-kyverno,type:phrase))))),index:'77166340-d05b-11ec-ae43-c5a0233ff698',interval:auto,query:(language:lucene,query:'PUT%20updaterequests'),sort:!('@timestamp',desc))

bastjan commented 1 year ago

We do ~25k PUT /updaterequests/ requests per hour (query&g0.tab=0&g0.stacked=0&g0.show_exemplars=0&g0.range_input=8d))

Screenshot 2022-12-01 at 15 39 02

Almost all of the manifests and queries on them are resulting from the quota-and-limit-range-in-ns policy.

❯ ka -nsyn-kyverno get updaterequests  | grep ur- | wc -l
     324
❯ ka -nsyn-kyverno get updaterequests  | grep ur- | grep quota-and-limit-range-in-ns | wc -l
     322

Why do these manifests get so many updates?

Kyverno reconciles these manifests every few minutes and on every update to the triggering object. Every update does:

bastjan commented 1 year ago

RMA0 with newer kyverno v1.8.2 does not seem to reconcile the quota-and-limit-range-in-ns, we need to see if that is because of the verison or cluster environment.

bastjan commented 1 year ago

Screenshot 2022-12-02 at 13 51 50

Kyverno might get triggered by the high amount of PUT namespaces. Almost all puts seem to come from the system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount.

bastjan commented 1 year ago

These PUT do not show up with watch so a kubebuilder controller should be fine.

bastjan commented 1 year ago

The issue seems to be fixed with OCP4.11.

We updated Kyverno to v1.8.x and OpenShift to v4.11.x. The drop in PUT updaterequests does correlate with the OCP upgrade, not with the Kyverno upgrade. See attached graph.

Screenshot 2022-12-13 at 16-42-11 Prometheus Time Series Collection and Processing Server

bastjan commented 1 year ago

The OLM operator does not show up in apirequestcounts anymore. Seems to be a fix from RedHat side.

Closing this as done.