DataDog / datadog-operator

Kubernetes Operator for Datadog Resources
Apache License 2.0
305 stars 105 forks source link

Support GKE Autopilot #663

Open leosunmo opened 1 year ago

leosunmo commented 1 year ago

Currently (v0.8.1) GKE Autopilot causes the following errors from the GKE Warden's admission webhook:

datadog-operator {"level":"ERROR","ts":"2022-12-13T15:31:07Z","logger":"controller-runtime.manager.controller.datadogagent","msg":"Reconciler error","reconciler group":"datadoghq.com","reconciler kind":"DatadogAgent","name":"datadog","namespace":"datadog","error":"admission webhook \"gkepolicy.common-webhooks.networking.gke.io\" denied the request: GKE Warden rejected the request because it violates one or more constraints.\nViolations details: {\"[denied by autogke-no-write-mode-hostpath]\":[\"hostPath volume procdir used in container agent uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume cgroups used in container agent uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume runtimesocketdir used in container agent uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume cgroups used in container process-agent uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume procdir used in container process-agent uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume runtimesocketdir used in container process-agent uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume procdir used in container init-config uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume cgroups used in container init-config uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\",\"hostPath volume runtimesocketdir used in container init-config uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].\"]}\nRequested by user: 'system:serviceaccount:datadog:datadog-operator', groups: 'system:serviceaccounts,system:serviceaccounts:datadog,system:authenticated'."}

It should either be clearly documented somewhere that GKE Autopilot reduces the features supported (and specifically which features), or preferably a workaround should be developed.

Here's a cleaned up list of the denied volume mounts:

admission webhook "gkepolicy.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more constraints.
Violations details: [denied by autogke-no-write-mode-hostpath]:

hostPath volume procdir used in container agent uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume cgroups used in container agent uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume runtimesocketdir used in container agent uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume cgroups used in container process-agent uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume procdir used in container process-agent uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume runtimesocketdir used in container process-agent uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume procdir used in container init-config uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume cgroups used in container init-config uses path /sys/fs/cgroup which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

hostPath volume runtimesocketdir used in container init-config uses path /var/run which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/].

Requested by user: 'system:serviceaccount:datadog:datadog-operator', groups: 'system:serviceaccounts,system:serviceaccounts:datadog,system:authenticated'.
CharlyF commented 1 year ago

Thanks for opening this one up separately! I will add it to the card we have in our backlog. We have this feature very high in the list, and I agree we should document this better. We will keep you posted when we add support to the operator, it will likely only be in the 1.x version (that we are currently releasing).

tanqhnguyen commented 1 year ago

Any news on this one? we are encountering the same problem and can't set up our cluster properly

adaosantos commented 1 year ago

On trying create a ClusterAgent we have the similar behavior, the Operator don't create the serviceaccount. error looking up service account datadog/datadog-cluster-agent: serviceaccount "datadog-cluster-agent" not found

lyona commented 1 year ago

Same problem. When can we expect this to be resolved?

mrkmcknz commented 11 months ago

Having the same issue on autopilot, sad to see this is not working after a year.

tkoft commented 8 months ago

Helm chart has supported autopilot for years now, can we please get that functionality ported here :)

rojomisin commented 7 months ago

Observations on this issue:

Is this a helm template issue? permissions issue?

The documentation for gke autopilot does not cover the Operator method, only Helm (no Tab for it), does that mean Operator does not work on autopilot?

Screenshot 2024-04-12 at 5 58 30 PM

https://docs.datadoghq.com/containers/kubernetes/distributions/?tab=operator&site=us5#autopilot

erabusi commented 4 months ago

i can see this from below blog, If you’re using GKE Autopilot, the Helm chart is the best way to install Datadog, as the Operator is not currently supported. https://www.datadoghq.com/blog/monitor-google-kubernetes-engine/#deploying-the-datadog-agent-to-your-gke-cluster Which means datadog operator is not yet ready to support for GKE autopilot cluster, we probably need to create a feature request for this....