DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.87k stars 1.21k forks source link

kube-state-metrics publishes on 8080 by default, but datadog-agent looks on 8081 #1523

Closed benbc closed 6 years ago

benbc commented 6 years ago

I'm using the DataDog agent (v6.1.0) and kube-state-metrics (v1.2.0) on GKE (v1.8.9). I'm using the Kubernetes manifests provided by both projects with minimal modifications.

I see this error in my logs:

[ AGENT ] 2018-03-26 13:14:23 UTC | ERROR | (runner.go:276 in work) | Error running check kubernetes_state: [{"message": "HTTPConnectionPool(host='10.60.2.18', port=8081): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efcc08619d0>: Failed to establish a new connection: [Errno 111] Connection refused',))", "traceback": "Traceback (most recent call last):\n File \"/opt/datadog-agent/bin/agent/dist/checks/__init__.py\", line 332, in run\n self.check(copy.deepcopy(self.instances[0]))\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/kubernetes_state/kubernetes_state.py\", line 196, in check\n self.process(endpoint, send_histograms_buckets=send_buckets, instance=instance)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/checks/prometheus/mixins.py\", line 350, in process\n for metric in self.scrape_metrics(endpoint):\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/checks/prometheus/mixins.py\", line 314, in scrape_metrics\n response = self.poll(endpoint)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/checks/prometheus/mixins.py\", line 467, in poll\n response = requests.get(endpoint, headers=headers, stream=True, timeout=1, cert=cert, verify=verify)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/api.py\", line 72, in get\n return request('get', url, params=params, **kwargs)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/api.py\", line 58, in request\n return session.request(method=method, url=url, **kwargs)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/sessions.py\", line 508, in request\n resp = self.send(prep, **send_kwargs)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/sessions.py\", line 618, in send\n r = adapter.send(request, **kwargs)\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/adapters.py\", line 508, in send\n raise ConnectionError(e, request=request)\nConnectionError: HTTPConnectionPool(host='10.60.2.18', port=8081): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efcc08619d0>: Failed to establish a new connection: [Errno 111] Connection refused',))\n"}]

It looks like the check is trying to scrape the /metrics URL from port 8081. However kube-state-metrics exposes that URL on port 8080 by default. (The IP address of the container is correct.)

I've spent a while reading your docs and spelunking in a couple of your codebases and I can't work out how that default gets set or how to override it.

So I suppose I have a couple of questions:

  1. How do I change the port that the check uses?
  2. Is there some reason for it not to use the default port by default?

Thanks -Ben

benbc commented 6 years ago

Sorry for spamming you today. :-)

mfpierre commented 6 years ago

Hi @benbc thanks for the report, it seems that the KSM project recently added the 8081 port in the exposed port list and this messing up with the autodiscovery template, because in case of multiple exposed port, we'll take the highest port number. I've made a PR to fix the official AD template https://github.com/DataDog/integrations-core/pull/1308 in the meanwhile you can either edit the AD template or try using annotations on the KSM pod

benbc commented 6 years ago

@mfpierre Thank you.

I'm sure I can use the annotation approach. The diff for your fix will help with that. I was unsure, reading your docs, which pod the annotation should go on (datadog-agent or kube-state-metrics), but your comment above makes that clear, so I now I have everything I need.

What is the easiest way for me to trace your linked fix in https://github.com/DataDog/integrations-core to a published version of the datadog/datadog-agent Docker image?

mfpierre commented 6 years ago

@benbc just merged the PR should go out with the next agent release

pdecat commented 6 years ago

Here is the proper way to fix this issue with annotations for Datadog agent version 5:

diff --git a/kube-state-metrics/kube-state-metrics-deployment.yaml b/kube-state-metrics/kube-state-metrics-deployment.yaml
index 2e8dc47..92417e4 100644
--- a/kube-state-metrics/kube-state-metrics-deployment.yaml
+++ b/kube-state-metrics/kube-state-metrics-deployment.yaml
@@ -14,6 +14,10 @@ spec:                           
     metadata:                                     
       labels:                                     
         k8s-app: kube-state-metrics               
+      annotations:                                
+        service-discovery.datadoghq.com/kube-state-metrics.check_names: '["kubernetes_state"]'        
+        service-discovery.datadoghq.com/kube-state-metrics.init_configs: '[{}]'                       
+        service-discovery.datadoghq.com/kube-state-metrics.instances: '[{"kube_state_url": "http://%%host%%:%%port_0%%/metrics"}]'                                                                            
     spec:                                         
       serviceAccountName: kube-state-metrics      
       containers:                                 

Datadog agent version 6 users should probably only need to replace service-discovery.datadoghq.com by ad.datadoghq.com.

Edit: note the use of %%port_0%% instead of a hard coded value.