Closed jwtrim closed 1 month ago
Hello,
Same issue here with the "datadog-lib-python-init", we get an OOMKilled as described by @jwtrim
Datadog Agent and Cluster Agent version 7.57.0
Hi, We are facing the same issue here with Java, .Net, Python, and JS init containers.
$ agent version
Cluster Agent 7.56.0 - Commit: f7e1780 - Serialization version: v5.0.124 - Go version: go1.22.5
admission.datadoghq.com/java-lib.version: v1.38.1
admission.datadoghq.com/js-lib.version: v5.21.0
admission.datadoghq.com/python-lib.version: v2.11.0
admission.datadoghq.com/dotnet-lib.version: v2.56.0
admission.datadoghq.com/ruby-lib.version: v2.2.0
It would be great to have the option to customize the memory limits
Seeing this issue as well. Would like an option to customize the memory limits
Happened with all of our python injection. For us, it was related to upgrading the GKE cluster from 1.29.xxx to 1.30.xxx. Took us days to figure it out the issue was related to GKE update. We performed nodepool rollback and init OOM stopped instantly
Hello all, to avoid the OOMs, we've increased for now the default requests/limits to 100Mi
with agent 7.57.2
to conform with Alpine recommended base values: https://github.com/DataDog/datadog-agent/blob/main/CHANGELOG.rst#bug-fixes.
The initContainer resources can also be configured manually using the DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_INIT_RESOURCES_MEMORY
environment variable on the cluster agent https://github.com/DataDog/datadog-agent/blob/596053e0d87db92237f887e9302c088650698893/pkg/clusteragent/admission/mutate/autoinstrumentation/auto_instrumentation.go#L666:
env:
- name: DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_INIT_RESOURCES_MEMORY
value: "50Mi"
While the feature is in beta, we are working on optimizing the memory usage before general availability.
Hello all, to avoid the OOMs, we've increased for now the default requests/limits to
100Mi
with agent7.57.2
to conform with Alpine recommended base values: https://github.com/DataDog/datadog-agent/blob/main/CHANGELOG.rst#bug-fixes.The initContainer resources can also be configured manually using the
DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_INIT_RESOURCES_MEMORY
environment variable on the cluster agent https://github.com/DataDog/datadog-agent/blob/596053e0d87db92237f887e9302c088650698893/pkg/clusteragent/admission/mutate/autoinstrumentation/auto_instrumentation.go#L666:env:
- name: DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_INIT_RESOURCES_MEMORY value: "50Mi" While the feature is in beta, we are working on optimizing the memory usage before general availability.
Can confirm that upgrading the agent worked.
Deploying the 7.57.2
agent resolved the issue,
Thanks!
Will be closing this issue as it's not related to the Helm chart, but rather to the Agent and this beta feature. As Fanny mentioned, the team is aware and working on the memory optimisation, while the default resources have been increased in the meantime
Describe what happened:
Java cronjob fails due to the datadog-lib-java-init container terminating with OOMKilled.
Describe what you expected:
The DD init container does not terminate with OOMKilled.
Steps to reproduce the issue:
The issue does seem to be intermittent, but adding auto-instrumentation on a Java cronjob and continually triggering the job does eventually reproduce the issue.
Additional environment details (Operating System, Cloud provider, etc):