SumoLogic / sumologic-kubernetes-collection

Sumo Logic collection solution for Kubernetes
Apache License 2.0
147 stars 183 forks source link

Support GKE AutoPilot #1468

Open frankreno opened 3 years ago

frankreno commented 3 years ago

Add support and document additional steps to install k8s collection on GKE clusters using Autopilot.

https://cloud.google.com/blog/products/containers-kubernetes/introducing-gke-autopilot

Unlike Fargate, the nodes are not completely abstracted away and current collection works. However there appear to be some issues with the Prometheus Operator and its need to create services in the kube-system namespace

Error: UPGRADE FAILED: failed to create resource: services is forbidden: User "freno@sumologic.com" cannot create resource "services" in API group "" in the namespace "kube-system": GKEAutopilot authz: the namespace "kube-system" is managed and the request's verb "create" is denied

Falco also issues:


[denied by autogke-disallow-hostnamespaces] enabling hostNetwork is not allowed in Autopilot. Requesting user: <freno@sumologic.com> and groups: <["system:authenticated"]>
[denied by autogke-disallow-hostnamespaces] enabling hostNetwork is not allowed in Autopilot. Requesting user: <freno@sumologic.com> and groups: <["system:authenticated"]>
[denied by autogke-no-write-mode-hostpath] hostPath volume docker-socket in container falco is accessed in write mode; disallowed in Autopilot. Requesting user: <freno@sumologic.com> and groups: <["system:authenticated"]>
[denied by autogke-no-write-mode-hostpath] hostPath volume containerd-socket in container falco is accessed in write mode; disallowed in Autopilot. Requesting user: <freno@sumologic.com> and groups: <["system:authenticated"]>
[denied by autogke-no-write-mode-hostpath] hostPath volume lib-modules in container falco is accessed in write mode; disallowed in Autopilot. Requesting user: <freno@sumologic.com> and groups: <["system:authenticated"]>
[denied by autogke-no-write-mode-hostpath] hostPath volume usr-fs used in container init-falco uses path /usr which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: ["/var/log/"]. Requesting user: <freno@sumologic.com> and groups: <["system:authenticated"]>
[denied by autogke-no-write-mode-hostpath] hostPath volume etc-fs used in container init-falco uses path /etc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: ["/var/log/"]. Requesting user: <freno@sumologic.com> and groups: <["system:authenticated"]>
[denied by autogke-no-write-mode-hostpath] hostPath volume dev-fs used in container falco uses path /dev which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: ["/var/log/"]. Requesting user: <freno@sumologic.com> and groups: <["system:authenticated"]>
[denied by autogke-no-write-mode-hostpath] hostPath volume proc-fs used in container falco uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: ["/var/log/"]. Requesting user: <freno@sumologic.com> and groups: <["system:authenticated"]>
[denied by autogke-no-write-mode-hostpath] hostPath volume boot-fs used in container falco uses path /boot which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: ["/var/log/"]. Requesting user: <freno@sumologic.com> and groups: <["system:authenticated"]>
[denied by autogke-no-write-mode-hostpath] hostPath volume usr-fs used in container falco uses path /usr which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: ["/var/log/"]. Requesting user: <freno@sumologic.com> and groups: <["system:authenticated"]>
[denied by autogke-no-write-mode-hostpath] hostPath volume etc-fs used in container falco uses path /etc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: ["/var/log/"]. Requesting user: <freno@sumologic.com> and groups: <["system:authenticated"]>
frankreno commented 3 years ago

Related issue - https://github.com/prometheus-community/helm-charts/issues/713

frankreno commented 3 years ago

This can be worked around by adding the following, however it means this data is not collected and some of the metrics we would normally collect are not

--set kube-prometheus-stack.coreDns.enabled=false --set kube-prometheus-stack.kubeControllerManager.enabled=false --set kube-prometheus-stack.kubeDns.enabled=false --set kube-prometheus-stack.kubeEtcd.enabled=false --set kube-prometheus-stack.kubeProxy.enabled=false --set kube-prometheus-stack.kubeScheduler.enabled=false

ronisec commented 2 years ago

Thanks for the comment Frank! This worked for me after turning off the nodeExporter as well (seems that GKE Autopilot isn't very happy with providing /proc and /sys paths to the nodeExporter.

The error I was hitting:

Error: failed to create resource: admission webhook "policycontrollerv2.common-webhooks.networking.gke.io" denied the request: GKE Policy Controller rejected the request because it violates one or more policies: {"[denied by autogke-disallow-hostnamespaces]":["enabling hostPID is not allowed in Autopilot. Requested by user: '<MY_EMAIL>', groups: 'system:authenticated'.","enabling hostNetwork is not allowed in Autopilot. Requested by user: '<MY_EMAIL>', groups: 'system:authenticated'."],"[denied by autogke-no-host-port]":["container prometheus-node-exporter specifies a host port; disallowed in Autopilot. Requested by user: '<MY_EMAIL>', groups: 'system:authenticated'."],"[denied by autogke-no-write-mode-hostpath]":["hostPath volume proc used in container prometheus-node-exporter uses path /proc which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/]. Requested by user: '<MY_EMAIL>', groups: 'system:authenticated'.","hostPath volume sys used in container prometheus-node-exporter uses path /sys which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/]. Requested by user: '<MY_EMAIL>', groups: 'system:authenticated'.","hostPath volume root used in container prometheus-node-exporter uses path / which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes are: [/var/log/]. Requested by user: '<MY_EMAIL>', groups: 'system:authenticated'."]}

Note that our cluster was originally running GKE AP v1.20.10-gke.1600, we were hitting issues with the mutatingwebhookconfigurations resource - probably related to this: https://github.com/jetstack/cert-manager/issues/3717). We upgraded to GKE AP v1.21.5-gke.1302 which solved the mutatingwebhookconfigurations issues with the following command:

gcloud container clusters upgrade <cluster-name> --master

See guide: https://cloud.google.com/kubernetes-engine/docs/how-to/upgrading-a-cluster#upgrading_the_cluster.

Final command that worked:

helm install prometheus prometheus-community/kube-prometheus-stack --namespace prometheus --set coreDns.enabled=false --set kubeControllerManager.enabled=false --set kubeDns.enabled=false --set kubeEtcd.enabled=false --set kubeProxy.enabled=false --set kubeScheduler.enabled=false --set nodeExporter.enabled=false
TuanLikeminds commented 1 year ago

thank you so much @ronisec