Closed bukovjanmic closed 2 years ago
Is the operator running with WATCH_NAMESPACE=""
? (The OLM AllNamespaces
installMode sets this also).
There's a comment in this area that mentions a possible performance issue with all namespaces:
https://github.com/ComplianceAsCode/compliance-operator/blob/38b928ec28a2a865d55a389c6974ee9d03545436/cmd/manager/operator.go#L177
Note that there's really no need to watch all namespaces since the operator API is cluster-wide and not intended for multi-tenant use (although there might be some case that I'm not aware of). OwnNamespace
ensures WATCH_NAMESPACE="openshift-compliance"
, which should not have this problem.
We should probably force the WATCH_NAMESPACE
to the operator's namespace when it's set to all, similar to how we did with file-integrity-operator (https://github.com/openshift/file-integrity-operator/pull/234)
I confirm that limiting operator to a single namespace seems to fix the OOMKill problem.
We have compliance operator 0.1.49 installed on Opemshift clusters (4.9.29).
On some clusters, the operator is OOMKilled, with different frequencies, as it exceeds 200Mi memory limit.
It seems this may be related to number of worker nodes or namespaces, on larger cluusters (20 worker nodes, hundreds of namespace) the frequency seems to be larger (currently at 225 OOMKill restarts in 24 hours), on smaller clusters (5 worker nodes) the frequency seems to be lower (5 restarts).
Either there is a memory leak under some circumstances or the memory request/limit needs to be increased.