Closed Gowiem closed 3 years ago
I have the same issue
I have the same issue root@datadog-rx29v:/# env | grep DD_KUBERNETES_KUBELET_HOST DD_KUBERNETES_KUBELET_HOST=172.50.0.90 root@datadog-rx29v:/# curl $DD_KUBERNETES_KUBELET_HOST:10255/healthz curl: (7) Failed to connect to 172.50.0.90 port 10255: Connection refused
@assinnata @nmadmon I'm being told by DD support that this likely related to permissions, which I thought but didn't have a good way to test. I'll be testing that out today and if that ends up bring the case then I'll let you folks know.
@Gowiem , did you succeed to find the root cause?
@nmadmon @assinnata I did. It did end up being a RBAC permissions issue. Here are the notes from DD support that helped me figure that out:
Could you please, inside the Datadog Pod test the following command? TOKEN=$(</var/run/secrets/kubernetes.io/serviceaccount/token) && curl https://$DD_KUBERNETES_KUBELET_HOST:10250/pods -v -k -H "Authorization: Bearer $TOKEN"
If this works, let's try adding the SSL certifications. TOKEN=$(</var/run/secrets/kubernetes.io/serviceaccount/token) && curl https://$DD_KUBERNETES_KUBELET_HOST:10250/pods -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $TOKEN"
If this doesn't work, this issue might be an authorization issue.
Use the following Agent RBAC when deploying the Agent as a sidecar in AWS EKS Fargate:
apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRole metadata: name: datadog-agent rules: - apiGroups: - "" resources: - nodes/metrics - nodes/spec - nodes/stats - nodes/proxy - nodes/pods - nodes/healthz verbs: - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: datadog-agent roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: datadog-agent subjects: - kind: ServiceAccount name: datadog-agent namespace: default --- apiVersion: v1 kind: ServiceAccount metadata: name: datadog-agent namespace: default
Could you connect directly to the host and run? ps aux | grep kubelet | grep -v grep
Is --authentication-token-webhook set?
Good luck with it!
"Unable to detect the kubelet URL automatically: impossible to reach Kubelet with host: 172.31.33.128. Please check if your setup requires kubelet_tls_verify = false. Activate debug logs to see all attempts made"
i am getting this error
Setting the DD_KUBELET_TLS_VERIFY
env var to "false"
in the agent did the trick for me.
Output of the info page (if this is a bug)
Describe what happened:
I'm trying to run DataDog as a sidecar on my EKS Fargate Nodes/Pods, but I'm continuing to get the seemingly common "cannot connect to kubelet" like errors - this is the latest iteration:
The important bit and the one that continues to repeat itself is
Get \\\"http://:10255/pods\\\": dial tcp :10255: connect: connection refused
.I followed this tutorial and the documentation to get this setup, but there is very little documentation on EKS + Fargate.
This is a similar issue to datadog/integrations#2582 && datadog/datadog-agent#2582 (and a bunch of others).
It is worth noting that I do have the datadog agent running successfully on my normal EKS worker nodes, but I have yet to have any success with Fargate. Would appreciate a pointer in the right direction or what I can do to further debug this. For example, I believe I have RBAC setup correctly (yaml below), but how can I test that? Thanks!
Describe what you expected:
I expected the pod to run without errors and be able to reach the kubelet.
Steps to reproduce the issue:
Here is my datadog agent sidecar helm template:
The underlying app's service account has the following RBAC permissions bound to it and the service account directory is mount:
Additional environment details (Operating System, Cloud provider, etc):
Kubernetes Version: 1.17 EKS Platform: eks.3