Open eldarmus opened 3 weeks ago
Hi @eldarmus, I'm unable to reproduce the error logs that you're seeing and my datadog-cluster-agent ClusterRole has the same permissions. I tested on a kind cluster with kubernetes version 1.27. Can you please share the kubernetes version you're using?
@fanny-jiang thank you for looking into it my kub version: v1.28.2 datadog chart version: datadog-operator-1.8.1
Here is my DatadogAgent manifest
kind: DatadogAgent
apiVersion: datadoghq.com/v2alpha1
metadata:
name: datadog
spec:
global:
site: us5.datadoghq.com
credentials:
apiSecret:
secretName: datadog-operator-apikey
keyName: api-key
appSecret:
secretName: datadog-operator-appkey
keyName: app-key
kubelet:
tlsVerify: false
clusterName: production-cluster
tags:
- team:production-team
- env:production
override:
clusterAgent:
image:
name: gcr.io/datadoghq/cluster-agent:latest
nodeAgent:
image:
name: gcr.io/datadoghq/agent:latest
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
features:
logCollection:
enabled: true
containerCollectAll: true
prometheusScrape:
enabled: true
enableServiceEndpoints: true
eventCollection:
collectKubernetesEvents: true
@eldarmus I also wasn't able to reproduce the error on my kind cluster (k8s v1.29, datadog-operator-1.8.1). Are your operator pod and the DatadogAgent object in the same namespace? Could you check for a clusterrole with the name <namespace>-<dda-name>-orch-exp-dca
? I think in your case, it would be datadog-datadog-orch-exp-dca
. It should have this permission:
$ kubectl get -oyaml clusterrole <namespace>-<dda-name>-orch-exp-dca
[...]
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- list
- watch
If that's there, then could you also check the clusterrolebinding <namespace>-<dda-name>-orch-exp-dca
? It should reference both the clusterrole from above and the service account for the dca <namespace>/datadog-cluster-agent
. In my example, my operator and dda are in the default
namespace:
$ kubectl describe clusterrole <namespace>-<dda-name>-orch-exp-dca
[...]
Role:
Kind: ClusterRole
Name: <namespace>-<dda-name>-orch-exp-dca
Subjects:
Kind Name Namespace
---- ---- ---------
ServiceAccount datadog-cluster-agent default
With the serviceaccount from the clusterrolebinding listed in your dca pod:
$ kubectl get pod <dca-pod> -oyaml
[...]
serviceAccount: datadog-cluster-agent
serviceAccountName: datadog-cluster-agent
That should allow the dca pod to have permissions to list CRDs in the apiextensions group that was listed in your error message. Maybe also check that automountServiceAccountToken: false
is not set in the serviceaccount or the dca pod. We don't set that in the operator, but perhaps it could be automatically set by some clusters or policies
@khewonc 1) yes they are both in the same namespace 2) datadog-datadog-orch-exp-dca customresourcedefinitions is not listed in ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/instance: datadog
app.kubernetes.io/managed-by: datadog-operator
app.kubernetes.io/name: datadog-agent-deployment
app.kubernetes.io/part-of: datadog-datadog
app.kubernetes.io/version: ""
operator.datadoghq.com/managed-by-store: "true"
name: datadog-datadog-orch-exp-dca
rules:
- apiGroups:
- ""
resourceNames:
- kube-system
resources:
- namespaces
verbs:
- get
- apiGroups:
- ""
resourceNames:
- datadog-cluster-id
resources:
- configmaps
verbs:
- get
- create
- update
- apiGroups:
- ""
resources:
- pods
- services
- nodes
verbs:
- list
- watch
- apiGroups:
- apps
resources:
- deployments
- replicasets
- daemonsets
- statefulsets
verbs:
- list
- watch
- apiGroups:
- batch
resources:
- jobs
- cronjobs
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- persistentvolumes
- persistentvolumeclaims
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- serviceaccounts
verbs:
- list
- watch
- apiGroups:
- rbac.authorization.k8s.io
resources:
- roles
- rolebindings
- clusterroles
- clusterrolebindings
verbs:
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs:
- list
- watch
- apiGroups:
- autoscaling.k8s.io
resources:
- verticalpodautoscalers
verbs:
- list
- watch
3) datadog-datadog-orch-exp-dca clusterrolebinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/instance: datadog
app.kubernetes.io/managed-by: datadog-operator
app.kubernetes.io/name: datadog-agent-deployment
app.kubernetes.io/part-of: datadog-datadog
app.kubernetes.io/version: ""
operator.datadoghq.com/managed-by-store: "true"
name: datadog-datadog-orch-exp-dca
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: datadog-datadog-orch-exp-dca
subjects:
- kind: ServiceAccount
name: datadog-cluster-agent
namespace: datadog
Describe what happened: We observed these logs in datadog-cluster-agent pod
2024-06-14 11:18:32 UTC | CLUSTER | ERROR | (apimachinery@v0.28.6/pkg/util/runtime/runtime.go:115 in logError) | pkg/mod/k8s.io/client-go@v0.28.6/tools/cache/reflector.go:229: Failed to watch *v1.CustomResourceDefinition: failed to list *v1.CustomResourceDefinition: customresourcedefinitions.apiextensions.k8s.io is forbidden: User "system:serviceaccount:datadog:datadog-cluster-agent" cannot list resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope 2024-06-14 11:18:36 UTC | CLUSTER | WARN | (pkg/collector/corechecks/cluster/orchestrator/collector_bundle.go:332 in Run) | check:orchestrator | Collector apiextensions.k8s.io/v1/customresourcedefinitions is skipped: couldn't sync informer apiextensions.k8s.io/v1/customresourcedefinitions in 1m5.000980622s 2024-06-14 11:18:46 UTC | CLUSTER | WARN | (pkg/collector/corechecks/cluster/orchestrator/collector_bundle.go:332 in Run) | check:orchestrator | Collector apiextensions.k8s.io/v1/customresourcedefinitions is skipped: couldn't sync informer apiextensions.k8s.io/v1/customresourcedefinitions in 1m5.000980622s 2024-06-14 11:18:56 UTC | CLUSTER | WARN | (pkg/collector/corechecks/cluster/orchestrator/collector_bundle.go:332 in Run) | check:orchestrator | Collector apiextensions.k8s.io/v1/customresourcedefinitions is skipped: couldn't sync informer apiextensions.k8s.io/v1/customresourcedefinitions in 1m5.000980622s
here is the definition of datadog-cluster-agent ClusterRole
Describe what you expected: No error logs
Steps to reproduce the issue:
Additional environment details (Operating System, Cloud provider, etc): Datadog-operator chart version: datadog-operator-1.7.1