gitops-engine directly calls kubectl command code to create/apply/replace/delete K8s resources on the cluster. This ensures that the logic used by gitops-engine consumers (such as Argo CD) interacts with those K8s resources in a way that is compatible to kubectl.
However, at present, gitops-engine does not specify a timeout value for 'kubectl create/apply/replace' commands.
This means that in rare cases (such as cluster/network issues), the kubectl operation will remaining running forever, waiting for an I/O operation that may never complete.
Normally this would just be a small memory leak (i.e. not necessarily the end of the world), however, in order to call the kubectl command code, gitops-engine writes manifest files to '/dev/shm', which are then passed via the '-f' file option to kubectl.
This means that those long-running I/O operations are also leaking K8s manifest files to /dev/shm: the K8s manifest files must remain in '/dev/shm' while the I/O operation is in progress. '/dev/shm' appears limited to 64MB, which can fill quickly.
When examining the contents of /dev/shm from users that have reported this issue, we see a large number of miscellanous manifests that are hours or days old (dating back to the lasted Pod restart).
The proposed solution (PR attached) is to add a long default timeout to calls to kubectl's apply command.
gitops-engine directly calls kubectl command code to create/apply/replace/delete K8s resources on the cluster. This ensures that the logic used by gitops-engine consumers (such as Argo CD) interacts with those K8s resources in a way that is compatible to kubectl.
However, at present, gitops-engine does not specify a timeout value for 'kubectl create/apply/replace' commands.
This means that in rare cases (such as cluster/network issues), the kubectl operation will remaining running forever, waiting for an I/O operation that may never complete.
Normally this would just be a small memory leak (i.e. not necessarily the end of the world), however, in order to call the kubectl command code, gitops-engine writes manifest files to '/dev/shm', which are then passed via the '-f' file option to kubectl.
This means that those long-running I/O operations are also leaking K8s manifest files to /dev/shm: the K8s manifest files must remain in '/dev/shm' while the I/O operation is in progress. '/dev/shm' appears limited to 64MB, which can fill quickly.
/dev/shm
from users that have reported this issue, we see a large number of miscellanous manifests that are hours or days old (dating back to the lasted Pod restart).The proposed solution (PR attached) is to add a long default timeout to calls to kubectl's apply command.
Related: https://github.com/argoproj/gitops-engine/issues/568