argoproj / gitops-engine

Democratizing GitOps
https://pkg.go.dev/github.com/argoproj/gitops-engine?tab=subdirectories
Apache License 2.0
1.67k stars 251 forks source link

'Apply/ReplaceResource' in resource_ops.go may leak files to '/dev/shm' since the kubectl 'apply/replace' commands never time out #572

Open jgwest opened 4 months ago

jgwest commented 4 months ago

gitops-engine directly calls kubectl command code to create/apply/replace/delete K8s resources on the cluster. This ensures that the logic used by gitops-engine consumers (such as Argo CD) interacts with those K8s resources in a way that is compatible to kubectl.

However, at present, gitops-engine does not specify a timeout value for 'kubectl create/apply/replace' commands.

This means that in rare cases (such as cluster/network issues), the kubectl operation will remaining running forever, waiting for an I/O operation that may never complete.

Normally this would just be a small memory leak (i.e. not necessarily the end of the world), however, in order to call the kubectl command code, gitops-engine writes manifest files to '/dev/shm', which are then passed via the '-f' file option to kubectl.

This means that those long-running I/O operations are also leaking K8s manifest files to /dev/shm: the K8s manifest files must remain in '/dev/shm' while the I/O operation is in progress. '/dev/shm' appears limited to 64MB, which can fill quickly.

The proposed solution (PR attached) is to add a long default timeout to calls to kubectl's apply command.

Related: https://github.com/argoproj/gitops-engine/issues/568