DataDog / helm-charts

Helm charts for Datadog products
Apache License 2.0
340 stars 1.01k forks source link

Unable to specify lifecycle hook in agent container #290

Open rr-binh-nguyen opened 3 years ago

rr-binh-nguyen commented 3 years ago

Describe what happened: Unable to specify lifecycle hook in agent pod. Reference: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/. Need the feature to implement https://docs.datadoghq.com/agent/guide/secrets-management/?tab=linux#providing-an-executable

Describe what you expected: Ability to run custom actions via lifecycle hook in agent pod

Steps to reproduce the issue: The agent template does not support lifecycle hook

Additional environment details (Operating System, Cloud provider, etc): Cloud provider: AWS OS: Ubuntu20

clamoriniere commented 3 years ago

Hello @rr-binh-nguyen

thanks for opening this issue.

the recommended way to include a "secret backend" implementation in the agent deployment, is to build a "custom" agent image on top of the official datadog/agent image.

for example:

FROM gcr.io/datadoghq/agent:7.28.1

COPY ./my-secret-backend /my-secret-backend
RUN chmod +x /my-secret-backend

It will improve the deployment of the solution, and remove any start dependency on an external component that will be need to download the implementation.

the other solutions:

please let us know if the proposed solution can work for you.

Thanks Cedric

rr-binh-nguyen commented 3 years ago

Hi @clamoriniere, thank for your prompt suggestion. For our use case, we try to minimize custom image we need to maintain. The extra volume solution would work too. However, we found lifecycle is the simplest to implement. Our secret backend binary is very small so it does not affect much the init time of the container. Thanks

ricoleabricot commented 2 months ago

Up this topic, can't we add the lifecycle values in helm chart? Moreover, there was a PR opened :)

clamoriniere commented 2 months ago

Hi @ricoleabricot

Could you explain the use case in your case that requires to have the possibility to configure container lifecycle hook. thanks in advance

ricoleabricot commented 2 months ago

Hello @clamoriniere thanks for your prompt reply 😄 I'd like to try out a preStop hook (or a terminationGracePeriodSeconds higher) to allow the agent to be removed in last on the node. Currently we have the agent being terminated before our apps gracefully terminate, so we have error to ingest traces/logs and so missing data (all pods contact the agent on the node itself being terminated bc of scaling down)

nijave commented 1 month ago

I'd also like to try to delay Datadog agent termination. Every time our nodes autoscale, the Datadog pods get killed immediately and application pods produce a burst of errors since they can't send telemetry (they all have a drain delay of around 30 seconds).

It'd be good if we could delay the termination of Datadog ~60 seconds to give other pods a chance to send telemetry while shutting down.