Open renchap opened 1 year ago
Hi, thanks for reaching out. To run a cluster check, the redisdb
configuration yaml will need to be added to the cluster agent (DCA) as the DCA will be the one to schedule checks in the node agent or the cluster checks runners (CLC runners). We'll be sure to have this documented when the operator goes GA to help prevent further confusion. Another small nit in your yaml configuration, cluster_checks: true
needs to be cluster_check: true
: https://docs.datadoghq.com/containers/cluster_agent/clusterchecks/?tab=operator#configuration-from-configuration-files. Here’s an example snippet to run a cluster check in a node agent:
override:
clusterAgent:
extraConfd:
configDataMap:
redisdb.yaml: |-
cluster_check: true
init_config:
instances:
- host: XXX
port: 1234
username: default
password: XXX
ssl: true
In the DCA, we can see which node the check is scheduled on:
root@datadog-cluster-agent-xxxxxxxxxx-xxxxx:/# agent clusterchecks
[...]
===== Checks on <hostname> =====
=== redisdb check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/redisdb.yaml
Instance ID: redisdb:1bd7f42364a4def9
empty_default_hostname: true
host: XXX
password: XXX
port: 1234
ssl: true
username: default
~
===
And if we check the that node's agent status
, the check should run. In this case, the error is expected since XXX:1234
isn't a valid address, but it at least shows that the configs we want are being used:
redisdb (4.5.2)
---------------
Instance ID: redisdb:1bd7f42364a4def9 [ERROR]
Configuration Source: file:/etc/datadog-agent/conf.d/redisdb.yaml
Total Runs: 4
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 4
Average Execution Time : 30ms
Last Execution Date : 2023-02-03 20:59:13 UTC (1675457953000)
Last Successful Execution Date : Never
Error: Error -5 connecting to XXX:1234. No address associated with hostname.
The same idea applies for CLC runners. Configure the check on the DCA and the DCA will pass schedule the check on the CLC runners. Using this config:
features:
clusterChecks:
useClusterChecksRunners: true
override:
clusterAgent:
extraConfd:
configDataMap:
redisdb.yaml: |-
cluster_check: true
init_config:
instances:
- host: XXX
port: 1234
username: default
password: XXX
ssl: true
We can check the checks are going to the CLC runners:
root@datadog-cluster-agent-xxxxxxxxxx-xxxxx:/# agent clusterchecks
[...]
===== Checks on datadog-cluster-checks-runner-xxxxxxxxxx-xxxxx =====
[...]
=== redisdb check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/redisdb.yaml
Instance ID: redisdb:1bd7f42364a4def9
empty_default_hostname: true
host: XXX
password: XXX
port: 1234
ssl: true
username: default
~
===
Running agent status
in that CLC runner:
root@datadog-cluster-checks-runner-xxxxxxxxxx-xxxxx:/# agent status
[...]
redisdb (4.5.2)
---------------
Instance ID: redisdb:1bd7f42364a4def9 [ERROR]
Configuration Source: file:/etc/datadog-agent/conf.d/redisdb.yaml
Total Runs: 37
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 37
Average Execution Time : 8ms
Last Execution Date : 2023-02-03 21:22:34 UTC (1675459354000)
Last Successful Execution Date : Never
Error: Error -5 connecting to XXX:1234. No address associated with hostname.
When adding an extraConfd
to the node agent, you'll find the config at /etc/datadog-agent/conf.d/
. As an example, for the config you posted to reproduce the issue, the config would be at /etc/datadog-agent/conf.d/redisdb.yaml
:
root@datadog-agent-xxxxx:/# cat /etc/datadog-agent/conf.d/redisdb.yaml
cluster_checks: true
init_config:
instances:
- host: XXX
port: 1234
username: default
password: XXX
ssl: true
Hope that helps clear up some of the issues you were seeing with the check configs.
Thanks for having a look at this.
I made your change but I am not seeing the same results as you.
Here is my config:
apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
name: datadog
namespace: datadog
spec:
global:
clusterName: xxx
site: datadoghq.eu
credentials:
apiSecret:
secretName: datadog-secret
keyName: api-key
appSecret:
secretName: datadog-secret
keyName: app-key
kubelet:
tlsVerify: false
features:
liveContainerCollection:
enabled: true
liveProcessCollection:
enabled: true
oomKill:
enabled: true
prometheusScrape:
enabled: false
clusterChecks:
enabled: true
useClusterChecksRunners: true
override:
clusterAgent:
extraConfd:
configDataMap:
redisdb.yml: |-
cluster_check: true
init_config:
instances:
- host: xxx
port: xx
username: default
password: xxx
ssl: true
I am running the latest 1.0.0-rc.8 operator.
Here the output for agent-status
on the DCA pod:
=========
Collector
=========
Running Checks
==============
kubernetes_apiserver
--------------------
Instance ID: kubernetes_apiserver [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_apiserver.d/conf.yaml.default
Total Runs: 19
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 3, Total: 25
Service Checks: Last Run: 5, Total: 70
Average Execution Time : 1.466s
Last Execution Date : 2023-02-03 22:14:48 UTC (1675462488000)
Last Successful Execution Date : 2023-02-03 22:14:48 UTC (1675462488000)
kubernetes_state_core
---------------------
Instance ID: kubernetes_state_core:b13e6c9d52886e07 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_state_core.d/kubernetes_state_core.yaml.default
Total Runs: 18
Metric Samples: Last Run: 1,600, Total: 22,298
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 8, Total: 112
Average Execution Time : 8ms
Last Execution Date : 2023-02-03 22:14:38 UTC (1675462478000)
Last Successful Execution Date : 2023-02-03 22:14:38 UTC (1675462478000)
orchestrator
------------
Instance ID: orchestrator:e1ef8faec3fcbfc1 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/orchestrator.d/orchestrator.yaml
Total Runs: 28
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 34ms
Last Execution Date : 2023-02-03 22:14:46 UTC (1675462486000)
Last Successful Execution Date : 2023-02-03 22:14:46 UTC (1675462486000)
No sign of redis here.
But this does not seem normal?
root@datadog-cluster-agent-7f7cc745c8-sdfzt:~# ls /conf.d/
redisdb.yml
root@datadog-cluster-agent-7f7cc745c8-sdfzt:~# ls /etc/datadog-agent/conf.d/
kubernetes_apiserver.d kubernetes_state_core.d orchestrator.d
Hi, thanks for trying out the new configuration. Could you also try using redisdb.yaml
as the key in the override
section instead of redisdb.yml
?
override:
clusterAgent:
extraConfd:
configDataMap:
redisdb.yaml: |-
cluster_check: true
init_config:
instances:
- host: xxx
port: xx
username: default
password: xxx
ssl: true
Like you said, the location of the redisdb.yml
file looked odd in the DCA. I would have expected it to be copied from /conf.d/
to /etc/datadog-agent/conf.d/
. My guess is this happened because this line in the DCA entrypoint script copies over *.yaml
files, but not *.yml
files.
With redisdb.yml
, the file only exists at /conf.d/
:
root@datadog-cluster-agent-xxxxxxxxxx-xxxxx:/# ls /conf.d/
redisdb.yml
root@datadog-cluster-agent-xxxxxxxxxx-xxxxx:/# ls /etc/datadog-agent/conf.d/
kubernetes_apiserver.d kubernetes_state_core.d orchestrator.d
But changing it to redisdb.yaml
allows the file to be copied over to the correct place:
root@datadog-cluster-agent-xxxxxxxxxx-xxxxx:/# ls /conf.d/
redisdb.yaml
root@datadog-cluster-agent-xxxxxxxxxx-xxxxx:/# ls /etc/datadog-agent/conf.d/
kubernetes_apiserver.d kubernetes_state_core.d orchestrator.d redisdb.yaml
Double checking with agent clusterchecks
, I see that the config is now picked up:
root@datadog-cluster-agent-xxxxxxxxxx-xxxxx:/# agent clusterchecks
[...]
===== Checks on datadog-cluster-checks-runner-xxxxxxxxxx-xxxxx =====
[...]
=== redisdb check ===
Configuration provider: file
Configuration source: file:/etc/datadog-agent/conf.d/redisdb.yaml
Instance ID: redisdb:eb960cd1e44d41c4
empty_default_hostname: true
host: xxx
password: xxx
port: xx
ssl: true
tags:
- kube_cluster_name:kind-test
- cluster_name:kind-test
username: default
~
===
Thanks, it indeed works with a .yaml
extension!
It may be a good idea to handle both extensions :)
Glad to hear that worked and thanks for the feedback. I'll add a card to the backlog to better support .yml
extensions in the check configs.
Describe what happened:
I am trying to configure cluster checks to monitor external managed services. Due to the issues described in #689 I need to use
v2alpha1
DatadogAgent resource.I am using a
spec
containing this:When
<runner>
isclusterAgent, I can see that
/conf.d/redisdb.ymlexists in the
datadog-cluster-agent` pod (with no effect, but thats another issue).If
<runner>
isnodeAgent
orclusterChecksRunner
(withfeatures.clusterChecks.useClusterChecksRunners
enabled), then/conf.d/redisdb.yml
is not created in those containers.Describe what you expected:
I expect my
redisdb.yml
configuration file to be created in the containers and picked up by the running agent.Steps to reproduce the issue:
Use this
DatadogAgent
file with Operator 1.0.0-rc7Then enter a datadog-agent pod, and
/conf.d/
will be empty.Additional environment details (Operating System, Cloud provider, etc):
Operator version : 1.0.0-rc7 Cloud provider: Exoscale (using their managed k8s offering) This is related to Datadog support ticket #1081449