DataDog / datadog-operator

Kubernetes Operator for Datadog Resources
Apache License 2.0
300 stars 104 forks source link

OpenShift Kubernetes Operator/Agent implementation #792

Closed OurFriendIrony closed 1 year ago

OurFriendIrony commented 1 year ago

Describe what happened: I'm implementing a ROSA/OpenShift on AWS cluster, which is essentially Kubernetes with various deployments implemented.
As this is a fresh install, I am following the Getting Started documentation but I am currently unable to get the agent/cluster-agent to run successfully. The steps seem incredibly simple and I have not introduced any adjustments from the guide, so I'm at a loss at to the cause of the issue.

Describe what you expected: Following the "Getting Started" documentation, I expect 1 operator, 1 cluster-agent and 3 agent pods with status running. The intention is to have a single datadog operator, which an agent located in 3 different namespaces, but currently I have been unable to get running in a single namespace. Any suggestions would be appreciated.

Steps to reproduce the issue: In openshift cluster, in default namespace,

helm repo add datadog https://helm.datadoghq.com
helm repo update
helm install my-datadog-operator datadog/datadog-operator
kubectl create secret generic datadog-secret --from-literal api-key=XXXX --from-literal app-key=XXXX 
kubectl apply -f agent-v1.yml

using the following agent configuration

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
spec:
  global:
    credentials:
      apiSecret:
        secretName: datadog-secret
        keyName: api-key
      appSecret:
        secretName: datadog-secret
        keyName: app-key
  features:
    apm:
      enabled: true
    logCollection:
      enabled: true

Which produces an agent error:

2023-05-19 12:46:10 UTC | CORE | WARN | (pkg/util/log/log.go:618 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
2023-05-19 12:46:10 UTC | CORE | INFO | (pkg/util/log/log.go:590 in func1) | Features detected from environment: kubernetes
2023-05-19 12:46:10 UTC | CORE | INFO | (pkg/runtime/runtime.go:27 in func1) | runtime: final GOMAXPROCS value is: 4
2023-05-19 12:46:10 UTC | CORE | INFO | (cmd/agent/subcommands/run/command.go:248 in startAgent) | Starting Datadog Agent v7.43.1
2023-05-19 12:46:11 UTC | CORE | ERROR | (cmd/agent/subcommands/run/command.go:309 in startAgent) | Error while getting hostname, exiting: unable to reliably determine the host name. You can define one in the agent config file or in your hosts file
2023-05-19 12:46:11 UTC | CORE | INFO | (pkg/logs/logs.go:149 in Stop) | Stopping logs-agent
2023-05-19 12:46:11 UTC | CORE | INFO | (pkg/logs/logs.go:158 in Stop) | logs-agent stopped
2023-05-19 12:46:11 UTC | CORE | INFO | (cmd/agent/subcommands/run/command.go:537 in stopAgent) | See ya!
Error: Error while getting hostname, exiting: unable to reliably determine the host name. You can define one in the agent config file or in your hosts file

I have then adjusted the spec to be the below, and reapplied

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
spec:
  override:
    nodeAgent:
      env:
        - name: DD_HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
  global:
    credentials:
      apiSecret:
        secretName: datadog-secret
        keyName: api-key
      appSecret:
        secretName: datadog-secret
        keyName: app-key
  features:
    apm:
      enabled: true
    logCollection:
      enabled: true

Pod status

kubectl get pods
NAME                                     READY   STATUS             RESTARTS      AGE
datadog-agent-42hhl                      2/3     CrashLoopBackOff   5 (70s ago)   4m16s
datadog-agent-fqnzd                      2/3     CrashLoopBackOff   5 (72s ago)   4m16s
datadog-agent-z5pst                      2/3     CrashLoopBackOff   5 (49s ago)   4m16s
datadog-cluster-agent-795d5c48d6-fd5bz   1/1     Running            0             21h
my-datadog-operator-7c4d476bdc-lxsgm     1/1     Running            0             22h

I can see the following in the cluster-agent logs, which looks somewhat promising

2023-05-19 12:25:24 UTC | CLUSTER | INFO | (pkg/clusteragent/clusterchecks/dispatcher_nodes.go:123 in expireNodes) | Expiring out node ip-10-222-99-228.eu-west-1.compute.internal, last status report 38 seconds ago
2023-05-19 12:25:24 UTC | CLUSTER | INFO | (pkg/clusteragent/clusterchecks/dispatcher_nodes.go:123 in expireNodes) | Expiring out node ip-10-222-98-169.eu-west-1.compute.internal, last status report 38 seconds ago
2023-05-19 12:25:24 UTC | CLUSTER | INFO | (pkg/clusteragent/clusterchecks/dispatcher_nodes.go:123 in expireNodes) | Expiring out node ip-10-222-99-83.eu-west-1.compute.internal, last status report 38 seconds ago
2023-05-19 12:25:24 UTC | CLUSTER | WARN | (pkg/clusteragent/clusterchecks/dispatcher_nodes.go:152 in expireNodes) | No nodes reporting, cluster checks will not run
2023-05-19 12:26:24 UTC | CLUSTER | INFO | (pkg/forwarder/transaction/transaction.go:382 in internalProcess) | Successfully posted payload to "https://7-43-1-app.agent.datadoghq.com/api/v1/check_run"

agent logs are as follows:

Defaulted container "agent" out of: agent, trace-agent, process-agent, init-volume (init), init-config (init)
2023-05-19 12:47:57 UTC | CORE | WARN | (pkg/util/log/log.go:618 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
2023-05-19 12:47:57 UTC | CORE | INFO | (pkg/util/log/log.go:590 in func1) | Features detected from environment: kubernetes
2023-05-19 12:47:57 UTC | CORE | INFO | (pkg/runtime/runtime.go:27 in func1) | runtime: final GOMAXPROCS value is: 4
2023-05-19 12:47:57 UTC | CORE | INFO | (cmd/agent/subcommands/run/command.go:248 in startAgent) | Starting Datadog Agent v7.43.1
2023-05-19 12:47:57 UTC | CORE | INFO | (cmd/agent/subcommands/run/command.go:311 in startAgent) | Hostname is: ip-10-222-98-169.eu-west-1.compute.internal
2023-05-19 12:47:57 UTC | CORE | INFO | (pkg/workloadmeta/store.go:402 in startCandidates) | workloadmeta collector "containerd" could not start. error: component workloadmeta-containerd is disabled: Agent is not running on containerd
2023-05-19 12:47:57 UTC | CORE | INFO | (pkg/workloadmeta/store.go:402 in startCandidates) | workloadmeta collector "docker" could not start. error: component workloadmeta-docker is disabled: Agent is not running on Docker
2023-05-19 12:47:57 UTC | CORE | INFO | (pkg/workloadmeta/store.go:402 in startCandidates) | workloadmeta collector "cloudfoundry-vm" could not start. error: component workloadmeta-cloudfoundry-vm is disabled: Agent is not running on CloudFoundry
2023-05-19 12:47:57 UTC | CORE | INFO | (pkg/workloadmeta/store.go:402 in startCandidates) | workloadmeta collector "ecs" could not start. error: component workloadmeta-ecs is disabled: Agent is not running on ECS EC2
2023-05-19 12:47:57 UTC | CORE | INFO | (pkg/workloadmeta/store.go:402 in startCandidates) | workloadmeta collector "cloudfoundry-container" could not start. error: component workloadmeta-cloudfoundry-container is disabled: Agent is not running on CloudFoundry
2023-05-19 12:47:57 UTC | CORE | INFO | (pkg/workloadmeta/store.go:402 in startCandidates) | workloadmeta collector "ecs_fargate" could not start. error: component workloadmeta-ecs_fargate is disabled: Agent is not running on Fargate
2023-05-19 12:47:57 UTC | CORE | INFO | (pkg/workloadmeta/store.go:402 in startCandidates) | workloadmeta collector "podman" could not start. error: component workloadmeta-podman is disabled: Podman not detected
2023-05-19 12:47:57 UTC | CORE | INFO | (pkg/workloadmeta/store.go:153 in Start) | workloadmeta store initialized successfully
2023-05-19 12:47:57 UTC | CORE | INFO | (pkg/collector/python/init.go:324 in resolvePythonExecPath) | Using '/opt/datadog-agent/embedded' as Python home
2023-05-19 12:47:57 UTC | CORE | INFO | (pkg/collector/python/init.go:391 in Initialize) | Initializing rtloader with Python 3 /opt/datadog-agent/embedded
2023-05-19 12:47:57 UTC | CORE | INFO | (pkg/tagger/collectors/workloadmeta_main.go:115 in stream) | workloadmeta tagger collector started
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:134 in LogMessage) | - | (ddyaml.py:143) | monkey patching yaml.load...
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:134 in LogMessage) | - | (ddyaml.py:147) | monkey patching yaml.load_all...
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:134 in LogMessage) | - | (ddyaml.py:151) | monkey patching yaml.dump_all... (affects all yaml dump operations)
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/collector/collector.go:56 in NewCollector) | Embedding Python 3.8.16 (default, Mar  7 2023, 12:42:17) [GCC 4.9.4]
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/autodiscovery/providers/config_reader.go:170 in read) | Searching for configuration files at: /etc/datadog-agent/conf.d
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/autodiscovery/providers/config_reader.go:246 in collectEntry) | Skipping 'auto_conf.yaml' for integration 'kubernetes_state'
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/autodiscovery/providers/config_reader.go:170 in read) | Searching for configuration files at: /opt/datadog-agent/bin/agent/dist/conf.d
2023-05-19 12:47:58 UTC | CORE | WARN | (pkg/autodiscovery/providers/config_reader.go:174 in read) | Skipping, open /opt/datadog-agent/bin/agent/dist/conf.d: no such file or directory
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/autodiscovery/providers/config_reader.go:170 in read) | Searching for configuration files at:
2023-05-19 12:47:58 UTC | CORE | WARN | (pkg/autodiscovery/providers/config_reader.go:174 in read) | Skipping, open : no such file or directory
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/config/autodiscovery/autodiscovery.go:95 in DiscoverComponentsFromEnv) | Adding KubeContainer provider from environment
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/config/autodiscovery/autodiscovery.go:117 in DiscoverComponentsFromEnv) | Adding Kubelet listener from environment
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/api/security/security.go:188 in getClusterAgentAuthToken) | Using configured cluster_agent.auth_token
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/util/clusteragent/clusteragent.go:129 in init) | Successfully connected to the Datadog Cluster Agent 7.43.1+commit.9e9c790
2023-05-19 12:47:58 UTC | CORE | ERROR | (pkg/autodiscovery/providers/endpointschecks.go:120 in getNodename) | Cannot get kubeUtil object: temporary failure in kubeutil, will retry later: impossible to reach Kubelet with host: 10.222.98.169. Please check if your setup requires kubelet_tls_verify = false. Activate debug logs to see all attempts made
2023-05-19 12:47:58 UTC | CORE | ERROR | (pkg/autodiscovery/providers/endpointschecks.go:50 in NewEndpointsChecksConfigProvider) | Cannot get node name: temporary failure in kubeutil, will retry later: impossible to reach Kubelet with host: 10.222.98.169. Please check if your setup requires kubelet_tls_verify = false. Activate debug logs to see all attempts made
2023-05-19 12:47:58 UTC | CORE | ERROR | (cmd/agent/common/autodiscovery.go:125 in setupAutoDiscovery) | Error while adding config provider endpointschecks: temporary failure in kubeutil, will retry later: impossible to reach Kubelet with host: 10.222.98.169. Please check if your setup requires kubelet_tls_verify = false. Activate debug logs to see all attempts made
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/autodiscovery/autoconfig.go:310 in initListenerCandidates) | environment listener successfully started
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/autodiscovery/autoconfig.go:310 in initListenerCandidates) | kubelet listener successfully started
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/autodiscovery/listeners/workloadmeta.go:134 in Listen) | ad-kubeletlistener initialized successfully
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/autodiscovery/listeners/environment.go:65 in createServices) | Listener created kubelet service from environment
2023-05-19 12:47:58 UTC | CORE | INFO | (pkg/autodiscovery/listeners/environment.go:72 in createServices) | Listener created container service from environment
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/api/security/security.go:249 in saveAuthToken) | Wrote auth token
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/api/security/security.go:143 in fetchAuthToken) | Saved a new authentication token to /etc/datadog-agent/auth_token
2023-05-19 12:47:59 UTC | CORE | INFO | (cmd/agent/subcommands/run/command.go:362 in startAgent) | GUI server port -1 specified: not starting the GUI.
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/forwarder/forwarder.go:235 in NewDefaultForwarder) | Retry queue storage on disk is disabled
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/aggregator/time_sampler.go:50 in NewTimeSampler) | Creating TimeSampler #0
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/forwarder/forwarder.go:357 in Start) | Forwarder started, sending to 1 endpoint(s) with 1 worker(s) each: "https://7-43-1-app.agent.datadoghq.com" (1 api key(s))
2023-05-19 12:47:59 UTC | CORE | INFO | (cmd/system-probe/config/config.go:121 in Merge) | no config exists at /etc/datadog-agent/system-probe.yaml, ignoring...
2023-05-19 12:47:59 UTC | CORE | WARN | (pkg/secrets/secrets.go:50 in Init) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/util/version_history.go:57 in logVersionHistoryToFile) | Cannot read file: /opt/datadog-agent/run/version-history.json, will create a new one. open /opt/datadog-agent/run/version-history.json: no such file or directory
2023-05-19 12:47:59 UTC | CORE | ERROR | (pkg/util/version_history.go:103 in logVersionHistoryToFile) | Cannot write json file: /opt/datadog-agent/run/version-history.json open /opt/datadog-agent/run/version-history.json: permission denied
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/collector/runner/runner.go:92 in ensureMinWorkers) | Runner 1 added 4 workers (total: 4)
2023-05-19 12:47:59 UTC | CORE | ERROR | (pkg/dogstatsd/server.go:270 in NewServer) | can't listen: listen unixgram /var/run/datadog/statsd/dsd.socket: bind: permission denied
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/dogstatsd/listeners/udp.go:95 in Listen) | dogstatsd-udp: starting to listen on 127.0.0.1:8125
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/logs/client/http/destination.go:383 in CheckConnectivity) | Checking HTTP connectivity...
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/logs/client/http/destination.go:389 in CheckConnectivity) | Sending HTTP connectivity request to https://agent-http-intake.logs.datadoghq.com/api/v2/logs...
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/logs/client/http/destination.go:394 in CheckConnectivity) | HTTP connectivity successful
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/logs/logs.go:121 in start) | Starting logs-agent...
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/logs/logs.go:131 in start) | logs-agent started
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/logs/internal/launchers/container/launcher.go:92 in run) | Starting Container launcher
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/autodiscovery/config_poller.go:168 in collectOnce) | file provider: collected 56 new configurations, removed 0
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/collector/scheduler/scheduler.go:92 in Enter) | Scheduling check container with an interval of 15s
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/collector/scheduler/scheduler.go:92 in Enter) | Scheduling check cpu with an interval of 15s
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/util/cloudproviders/cloudproviders.go:50 in DetectCloudProvider) | Cloud provider AWS detected
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/collector/scheduler/scheduler.go:92 in Enter) | Scheduling check disk:67cc0574430a16ba with an interval of 15s
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/collector/scheduler/scheduler.go:92 in Enter) | Scheduling check file_handle with an interval of 15s
2023-05-19 12:47:59 UTC | CORE | INFO | (pkg/collector/scheduler/scheduler.go:92 in Enter) | Scheduling check io with an interval of 15s
2023-05-19 12:47:59 UTC | CORE | ERROR | (pkg/collector/python/kubeutil.go:41 in getConnections) | connection to kubelet failed: temporary failure in kubeutil, will retry later: impossible to reach Kubelet with host: 10.222.98.169. Please check if your setup requires kubelet_tls_verify = false. Activate debug logs to see all attempts made
2023-05-19 12:47:59 UTC | CORE | WARN | (pkg/collector/python/check.go:275 in Configure) | could not get a 'kubelet' check instance with the new api: Traceback (most recent call last):
  File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py", line 177, in __init__
    cadvisor_instance = self._create_cadvisor_prometheus_instance(inst)
  File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/prometheus.py", line 83, in _create_cadvisor_prometheus_instance
    'prometheus_url': instance.get('cadvisor_metrics_endpoint', urljoin(endpoint, CADVISOR_METRICS_PATH)),
  File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/kubelet_base/base.py", line 199, in urljoin
    return '/'.join(arg.strip('/') for arg in args)
  File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/kubelet_base/base.py", line 199, in <genexpr>
    return '/'.join(arg.strip('/') for arg in args)
AttributeError: 'NoneType' object has no attribute 'strip'
2023-05-19 12:47:59 UTC | CORE | WARN | (pkg/collector/python/check.go:276 in Configure) | trying to instantiate the check with the old api, passing agentConfig to the constructor
2023-05-19 12:47:59 UTC | CORE | ERROR | (pkg/collector/python/loader.go:242 in addExpvarConfigureError) | py.loader: could not configure check 'kubelet (7.5.2)': could not invoke 'kubelet' python check constructor. New constructor API returned:
Traceback (most recent call last):
  File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py", line 177, in __init__
    cadvisor_instance = self._create_cadvisor_prometheus_instance(inst)
  File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/prometheus.py", line 83, in _create_cadvisor_prometheus_instance
    'prometheus_url': instance.get('cadvisor_metrics_endpoint', urljoin(endpoint, CADVISOR_METRICS_PATH)),
  File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/kubelet_base/base.py", line 199, in urljoin
    return '/'.join(arg.strip('/') for arg in args)
  File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/kubelet_base/base.py", line 199, in <genexpr>
    return '/'.join(arg.strip('/') for arg in args)
AttributeError: 'NoneType' object has no attribute 'strip'
Deprecated constructor API returned:
__init__() got an unexpected keyword argument 'agentConfig'
2023-05-19 12:47:59 UTC | CORE | ERROR | (pkg/collector/scheduler.go:201 in getChecks) | Unable to load a check from instance of config 'kubelet': JMX Check Loader: check is not a jmx check, or unable to determine if it's so; Python Check Loader: could not configure check instance for python check kubelet: could not invoke 'kubelet' python check constructor. New constructor API returned:
Traceback (most recent call last):
  File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py", line 177, in __init__
    cadvisor_instance = self._create_cadvisor_prometheus_instance(inst)
  File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/prometheus.py", line 83, in _create_cadvisor_prometheus_instance
    'prometheus_url': instance.get('cadvisor_metrics_endpoint', urljoin(endpoint, CADVISOR_METRICS_PATH)),
  File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/kubelet_base/base.py", line 199, in urljoin
    return '/'.join(arg.strip('/') for arg in args)
  File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/kubelet_base/base.py", line 199, in <genexpr>
    return '/'.join(arg.strip('/') for arg in args)
AttributeError: 'NoneType' object has no attribute 'strip'
Deprecated constructor API returned:
__init__() got an unexpected keyword argument 'agentConfig'; Core Check Loader: Check kubelet not found in Catalog
2023-05-19 12:47:59 UTC | CORE | ERROR | (pkg/collector/scheduler.go:248 in GetChecksFromConfigs) | Unable to load the check: unable to load any check from config 'kubelet'
2023-05-19 12:48:00 UTC | CORE | WARN | (pkg/logs/auditor/auditor.go:184 in func2) | open /opt/datadog-agent/run/registry.json: permission denied
2023-05-19 12:48:00 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:container | Running check...
2023-05-19 12:48:00 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:container | Done running check
2023-05-19 12:48:00 UTC | CORE | INFO | (pkg/collector/scheduler/scheduler.go:92 in Enter) | Scheduling check kubernetes_apiserver with an interval of 15s
2023-05-19 12:48:00 UTC | CORE | INFO | (pkg/collector/scheduler/scheduler.go:92 in Enter) | Scheduling check load with an interval of 15s
2023-05-19 12:48:00 UTC | CORE | INFO | (pkg/collector/scheduler/scheduler.go:92 in Enter) | Scheduling check memory with an interval of 15s
2023-05-19 12:48:01 UTC | CORE | INFO | (pkg/collector/scheduler/scheduler.go:92 in Enter) | Scheduling check network:4b0649b7e11f0772 with an interval of 15s
2023-05-19 12:48:01 UTC | CORE | INFO | (pkg/util/cloudproviders/cloudproviders.go:75 in GetCloudProviderNTPHosts) | Using NTP servers from AWS cloud provider: ["169.254.169.123"]
2023-05-19 12:48:01 UTC | CORE | INFO | (pkg/collector/scheduler/scheduler.go:92 in Enter) | Scheduling check ntp:3c427a42a70bbf8 with an interval of 15m0s
2023-05-19 12:48:01 UTC | CORE | INFO | (pkg/collector/scheduler/scheduler.go:92 in Enter) | Scheduling check uptime with an interval of 15s
2023-05-19 12:48:01 UTC | CORE | INFO | (pkg/autodiscovery/autoconfig.go:204 in LoadAndRun) | Started config provider "file"
2023-05-19 12:48:01 UTC | CORE | INFO | (pkg/autodiscovery/autoconfig.go:202 in LoadAndRun) | Started config provider "cluster-checks", polled every 10s
2023-05-19 12:48:01 UTC | CORE | INFO | (pkg/autodiscovery/autoconfig.go:204 in LoadAndRun) | Started config provider "kubernetes-container-allinone"
2023-05-19 12:48:01 UTC | CORE | WARN | (cmd/agent/common/misconfig/global.go:15 in ToLog) | misconfig: proc mount: failed to open /host/proc/1/mounts - proc fs inspection may not work: open /host/proc/1/mounts: permission denied
2023-05-19 12:48:02 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:ntp | Running check...
2023-05-19 12:48:02 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:ntp | Done running check
2023-05-19 12:48:03 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:uptime | Running check...
2023-05-19 12:48:03 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:uptime | Done running check
2023-05-19 12:48:04 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:memory | Running check...
2023-05-19 12:48:04 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:memory | Done running check
2023-05-19 12:48:05 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:kubernetes_apiserver | Running check...
2023-05-19 12:48:05 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:kubernetes_apiserver | Done running check
2023-05-19 12:48:06 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:file_handle | Running check...
2023-05-19 12:48:06 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:file_handle | Done running check
2023-05-19 12:48:07 UTC | CORE | WARN | (pkg/util/cloudproviders/gce/gce_tags.go:50 in getCachedTags) | unable to get tags from gce and cache is empty: GCE metadata API error: Get "http://169.254.169.254/computeMetadata/v1/?recursive=true": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023-05-19 12:48:07 UTC | CORE | INFO | (pkg/metadata/host/host_tags.go:130 in GetHostTags) | Unable to get host tags from source: gce - using cached host tags
2023-05-19 12:48:07 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:cpu | Running check...
2023-05-19 12:48:07 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:cpu | Done running check
2023-05-19 12:48:11 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:network | Running check...
2023-05-19 12:48:11 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:network | Done running check
2023-05-19 12:48:12 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:load | Running check...
2023-05-19 12:48:12 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:load | Done running check
2023-05-19 12:48:13 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:io | Running check...
2023-05-19 12:48:13 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:io | Done running check
2023-05-19 12:48:14 UTC | CORE | INFO | (pkg/forwarder/transaction/transaction.go:377 in internalProcess) | Successfully posted payload to "https://7-43-1-app.agent.datadoghq.com/intake/", the agent will only log transaction success every 500 transactions
2023-05-19 12:48:14 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:disk | Running check...
2023-05-19 12:48:14 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:disk | Done running check
2023-05-19 12:48:15 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:container | Running check...
2023-05-19 12:48:15 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:container | Done running check
2023-05-19 12:48:16 UTC | CORE | INFO | (pkg/metadata/host/host_tags.go:130 in GetHostTags) | Unable to get host tags from source: kubernetes - using cached host tags
2023-05-19 12:48:17 UTC | CORE | INFO | (pkg/metadata/host/host.go:141 in getNetworkMeta) | could not get network metadata: could not detect network ID
2023-05-19 12:48:17 UTC | CORE | INFO | (pkg/serializer/serializer.go:413 in sendMetadata) | Sent metadata payload, size (raw/compressed): 8960/2645 bytes.
2023-05-19 12:48:18 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:uptime | Running check...
2023-05-19 12:48:18 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:uptime | Done running check
2023-05-19 12:48:19 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:memory | Running check...
2023-05-19 12:48:19 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:memory | Done running check
2023-05-19 12:48:20 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:kubernetes_apiserver | Running check...
2023-05-19 12:48:20 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:kubernetes_apiserver | Done running check
2023-05-19 12:48:21 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:file_handle | Running check...
2023-05-19 12:48:21 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:file_handle | Done running check
2023-05-19 12:48:22 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:cpu | Running check...
2023-05-19 12:48:22 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:cpu | Done running check
2023-05-19 12:48:23 UTC | CORE | INFO | (pkg/api/healthprobe/healthprobe.go:74 in healthHandler) | Healthcheck failed on: [metadata-host metadata-agent_checks metadata-inventories metadata-resources]
2023-05-19 12:48:23 UTC | CORE | INFO | (pkg/api/healthprobe/healthprobe.go:74 in healthHandler) | Healthcheck failed on: [metadata-inventories metadata-resources metadata-host metadata-agent_checks]
2023-05-19 12:48:24 UTC | CORE | INFO | (pkg/api/healthprobe/healthprobe.go:74 in healthHandler) | Healthcheck failed on: [metadata-inventories metadata-resources metadata-host metadata-agent_checks]
2023-05-19 12:48:24 UTC | CORE | INFO | (pkg/api/healthprobe/healthprobe.go:74 in healthHandler) | Healthcheck failed on: [metadata-agent_checks metadata-host metadata-inventories metadata-resources]
2023-05-19 12:48:24 UTC | CORE | INFO | (pkg/api/healthprobe/healthprobe.go:74 in healthHandler) | Healthcheck failed on: [metadata-inventories metadata-resources metadata-host metadata-agent_checks]
2023-05-19 12:48:24 UTC | CORE | INFO | (pkg/api/healthprobe/healthprobe.go:74 in healthHandler) | Healthcheck failed on: [metadata-inventories metadata-resources metadata-host metadata-agent_checks]
2023-05-19 12:48:24 UTC | CORE | INFO | (pkg/api/healthprobe/healthprobe.go:74 in healthHandler) | Healthcheck failed on: [metadata-inventories metadata-resources metadata-host metadata-agent_checks]
2023-05-19 12:48:24 UTC | CORE | INFO | (pkg/api/healthprobe/healthprobe.go:74 in healthHandler) | Healthcheck failed on: [metadata-resources metadata-host metadata-agent_checks metadata-inventories]
2023-05-19 12:48:25 UTC | CORE | INFO | (pkg/api/healthprobe/healthprobe.go:74 in healthHandler) | Healthcheck failed on: [metadata-agent_checks metadata-host metadata-inventories metadata-resources]
2023-05-19 12:48:25 UTC | CORE | INFO | (pkg/api/healthprobe/healthprobe.go:74 in healthHandler) | Healthcheck failed on: [metadata-host metadata-agent_checks metadata-inventories metadata-resources]
2023-05-19 12:48:26 UTC | CORE | INFO | (pkg/api/healthprobe/healthprobe.go:74 in healthHandler) | Healthcheck failed on: [metadata-host metadata-agent_checks metadata-inventories metadata-resources]
2023-05-19 12:48:26 UTC | CORE | INFO | (pkg/api/healthprobe/healthprobe.go:74 in healthHandler) | Healthcheck failed on: [metadata-resources metadata-host metadata-agent_checks metadata-inventories]
2023-05-19 12:48:26 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:network | Running check...
2023-05-19 12:48:26 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:network | Done running check
2023-05-19 12:48:27 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:load | Running check...
2023-05-19 12:48:27 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:load | Done running check
2023-05-19 12:48:28 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:io | Running check...
2023-05-19 12:48:28 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:io | Done running check
2023-05-19 12:48:29 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:disk | Running check...
2023-05-19 12:48:29 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:disk | Done running check
2023-05-19 12:48:30 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:container | Running check...
2023-05-19 12:48:30 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:container | Done running check
2023-05-19 12:48:33 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:uptime | Running check...
2023-05-19 12:48:33 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:uptime | Done running check
2023-05-19 12:48:34 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:memory | Running check...
2023-05-19 12:48:34 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:memory | Done running check
2023-05-19 12:48:35 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:kubernetes_apiserver | Running check...
2023-05-19 12:48:35 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:kubernetes_apiserver | Done running check
2023-05-19 12:48:36 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:file_handle | Running check...
2023-05-19 12:48:36 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:file_handle | Done running check
2023-05-19 12:48:37 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:cpu | Running check...
2023-05-19 12:48:37 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:cpu | Done running check
2023-05-19 12:48:41 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:network | Running check...
2023-05-19 12:48:41 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:network | Done running check
2023-05-19 12:48:42 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:load | Running check...
2023-05-19 12:48:42 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:load | Done running check
2023-05-19 12:48:43 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:io | Running check...
2023-05-19 12:48:43 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:io | Done running check
2023-05-19 12:48:44 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:disk | Running check...
2023-05-19 12:48:44 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:disk | Done running check
2023-05-19 12:48:45 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:container | Running check...
2023-05-19 12:48:45 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:container | Done running check
2023-05-19 12:48:48 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:uptime | Running check...
2023-05-19 12:48:48 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:uptime | Done running check
2023-05-19 12:48:49 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:memory | Running check...
2023-05-19 12:48:49 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:memory | Done running check
2023-05-19 12:48:50 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:kubernetes_apiserver | Running check...
2023-05-19 12:48:50 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:kubernetes_apiserver | Done running check
2023-05-19 12:48:51 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:file_handle | Running check...
2023-05-19 12:48:51 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:file_handle | Done running check
2023-05-19 12:48:52 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:cpu | Running check...
2023-05-19 12:48:52 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:cpu | Done running check
2023-05-19 12:48:56 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:network | Running check...
2023-05-19 12:48:56 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:network | Done running check
2023-05-19 12:48:57 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:load | Running check...
2023-05-19 12:48:57 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:load | Done running check
2023-05-19 12:48:58 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:io | Running check...
2023-05-19 12:48:58 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:io | Done running check
2023-05-19 12:48:59 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:disk | Running check...
2023-05-19 12:48:59 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:disk | Done running check
2023-05-19 12:49:00 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:container | Running check...
2023-05-19 12:49:00 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:container | Done running check, next runs will be logged every 500 runs
2023-05-19 12:49:03 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:uptime | Running check...
2023-05-19 12:49:03 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:uptime | Done running check, next runs will be logged every 500 runs
2023-05-19 12:49:04 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:memory | Running check...
2023-05-19 12:49:04 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:memory | Done running check, next runs will be logged every 500 runs
2023-05-19 12:49:05 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:kubernetes_apiserver | Running check...
2023-05-19 12:49:05 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:kubernetes_apiserver | Done running check, next runs will be logged every 500 runs
2023-05-19 12:49:06 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:file_handle | Running check...
2023-05-19 12:49:06 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:file_handle | Done running check, next runs will be logged every 500 runs
2023-05-19 12:49:07 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:cpu | Running check...
2023-05-19 12:49:07 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:cpu | Done running check, next runs will be logged every 500 runs
2023-05-19 12:49:11 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:network | Running check...
2023-05-19 12:49:11 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:network | Done running check, next runs will be logged every 500 runs
2023-05-19 12:49:12 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:load | Running check...
2023-05-19 12:49:12 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:load | Done running check, next runs will be logged every 500 runs
2023-05-19 12:49:13 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:io | Running check...
2023-05-19 12:49:13 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:io | Done running check, next runs will be logged every 500 runs
2023-05-19 12:49:14 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:disk | Running check...
2023-05-19 12:49:14 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:disk | Done running check, next runs will be logged every 500 runs
2023-05-19 12:52:57 UTC | CORE | INFO | (pkg/serializer/serializer.go:413 in sendMetadata) | Sent metadata payload, size (raw/compressed): 2581/989 bytes.
2023-05-19 12:53:17 UTC | CORE | INFO | (pkg/serializer/serializer.go:437 in SendProcessesMetadata) | Sent processes metadata payload, size: 367 bytes.
2023-05-19 12:58:17 UTC | CORE | INFO | (pkg/serializer/serializer.go:413 in sendMetadata) | Sent metadata payload, size (raw/compressed): 2423/997 bytes.
2023-05-19 12:58:17 UTC | CORE | INFO | (pkg/serializer/serializer.go:437 in SendProcessesMetadata) | Sent processes metadata payload, size: 368 bytes.
2023-05-19 13:02:57 UTC | CORE | INFO | (pkg/serializer/serializer.go:413 in sendMetadata) | Sent metadata payload, size (raw/compressed): 2581/988 bytes.
2023-05-19 13:03:02 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:38 in CheckStarted) | check:ntp | Running check...
2023-05-19 13:03:02 UTC | CORE | INFO | (pkg/collector/worker/check_logger.go:57 in CheckFinished) | check:ntp | Done running check
2023-05-19 13:03:17 UTC | CORE | INFO | (pkg/serializer/serializer.go:437 in SendProcessesMetadata) | Sent processes metadata payload, size: 367 bytes.

Additional environment details (Operating System, Cloud provider, etc): Provider: ROSA (RedHat OpenShift on AWS) Kubernetes: v1.25.8+37a9a08 Namespace: default Datadog Plan: Datadog Pro (app.datadoghq.com)

levan-m commented 1 year ago

From the logs and initial host name error it seems Agent is unable to connect to Kubelet. Could you please check these troubleshooting steps if they resolve your issue https://docs.datadoghq.com/agent/troubleshooting/hostname_containers/?tab=awsecsonec2#kubernetes-hostname-errors

OurFriendIrony commented 1 year ago

Hi @levan-m,

Thanks for the response. I posted 2 configurations, the first I believe returns the hostname error you referenced. I used one of the steps in the troubleshooting to design the 2nd configuration, which produces different errors.

Below I have introduced "tlsVerify: false" (one of the other suggestions on the troubleshooting) and gotten the below issues

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
spec:
  global:
    kubelet:
      tlsVerify: false
    credentials:
      apiSecret:
        secretName: datadog-secret
        keyName: api-key
      appSecret:
        secretName: datadog-secret
        keyName: app-key
  features:
    apm:
      enabled: true
    logCollection:
      enabled: true

cluster-agent reports no ERROR, last log: 2023-06-05 11:29:30 UTC | CLUSTER | INFO | (pkg/clusteragent/clusterchecks/handler.go:198 in leaderWatch) | Found leadership status after 14 tries

regular agents are in CrashLoopBackOff, I've dumped out the non-INFO logs below

Defaulted container "agent" out of: agent, trace-agent, process-agent, init-volume (init), init-config (init)
2023-06-05 11:28:15 UTC | CORE | WARN | (pkg/util/log/log.go:618 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
2023-06-05 11:28:19 UTC | CORE | WARN | (pkg/autodiscovery/providers/config_reader.go:174 in read) | Skipping, open /opt/datadog-agent/bin/agent/dist/conf.d: no such file or directory
2023-06-05 11:28:19 UTC | CORE | WARN | (pkg/autodiscovery/providers/config_reader.go:174 in read) | Skipping, open : no such file or directory
2023-06-05 11:28:19 UTC | CORE | WARN | (pkg/secrets/secrets.go:50 in Init) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
2023-06-05 11:28:19 UTC | CORE | ERROR | (pkg/util/version_history.go:103 in logVersionHistoryToFile) | Cannot write json file: /opt/datadog-agent/run/version-history.json open /opt/datadog-agent/run/version-history.json: permission denied
2023-06-05 11:28:19 UTC | CORE | ERROR | (pkg/dogstatsd/server.go:270 in NewServer) | can't listen: listen unixgram /var/run/datadog/statsd/dsd.socket: bind: permission denied
2023-06-05 11:28:20 UTC | CORE | WARN | (cmd/agent/common/misconfig/global.go:15 in ToLog) | misconfig: proc mount: failed to open /host/proc/1/mounts - proc fs inspection may not work: open /host/proc/1/mounts: permission denied
2023-06-05 11:28:21 UTC | CORE | WARN | (pkg/logs/auditor/auditor.go:184 in func2) | open /opt/datadog-agent/run/registry.json: permission denied
2023-06-05 11:28:24 UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:69 in Error) | check:datadog_cluster_agent | Error running check: [{"message": "HTTPConnectionPool(host='10.128.2.229', port=5000): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdbe916e040>: Failed to establish a new connection: [Errno 113] No route to host'))", "traceback": "Traceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 174, in _new_conn\n    conn = connection.create_connection(\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py\", line 95, in create_connection\n    raise err\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py\", line 85, in create_connection\n    sock.connect(sa)\nOSError: [Errno 113] No route to host\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py\", line 703, in urlopen\n    httplib_response = self._make_request(\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py\", line 398, in _make_request\n    conn.request(method, url, **httplib_request_kw)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 239, in request\n    super(HTTPConnection, self).request(method, url, body=body, headers=headers)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 1256, in request\n    self._send_request(method, url, body, headers, encode_chunked)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 1302, in _send_request\n    self.endheaders(body, encode_chunked=encode_chunked)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 1251, in endheaders\n    self._send_output(message_body, encode_chunked=encode_chunked)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 1011, in _send_output\n    self.send(msg)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 951, in send\n    self.connect()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 205, in connect\n    conn = self._new_conn()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 186, in _new_conn\n    raise NewConnectionError(\nurllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fdbe916e040>: Failed to establish a new connection: [Errno 113] No route to host\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py\", line 489, in send\n    resp = conn.urlopen(\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py\", line 787, in urlopen\n    retries = retries.increment(\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/retry.py\", line 592, in increment\n    raise MaxRetryError(_pool, url, error or ResponseError(cause))\nurllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='10.128.2.229', port=5000): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdbe916e040>: Failed to establish a new connection: [Errno 113] No route to host'))\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 1122, in run\n    self.check(instance)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/base_check.py\", line 142, in check\n    self.process(scraper_config)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py\", line 573, in process\n    for metric in self.scrape_metrics(scraper_config):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py\", line 500, in scrape_metrics\n    response = self.poll(scraper_config)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py\", line 837, in poll\n    response = self.send_request(endpoint, scraper_config, headers)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py\", line 863, in send_request\n    return http_handler.get(endpoint, stream=True, **kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py\", line 355, in get\n    return self._request('get', url, options)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py\", line 419, in _request\n    response = self.make_request_aia_chasing(request_method, method, url, new_options, persist)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py\", line 425, in make_request_aia_chasing\n    response = request_method(url, **new_options)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py\", line 73, in get\n    return request(\"get\", url, params=params, **kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py\", line 59, in request\n    return session.request(method=method, url=url, **kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py\", line 587, in request\n    resp = self.send(prep, **send_kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py\", line 701, in send\n    r = adapter.send(request, **kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py\", line 565, in send\n    raise ConnectionError(e, request=request)\nrequests.exceptions.ConnectionError: HTTPConnectionPool(host='10.128.2.229', port=5000): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdbe916e040>: Failed to establish a new connection: [Errno 113] No route to host'))\n"}]
2023-06-05 11:28:26 UTC | CORE | WARN | (pkg/util/cloudproviders/gce/gce_tags.go:50 in getCachedTags) | unable to get tags from gce and cache is empty: GCE metadata API error: Get "http://169.254.169.254/computeMetadata/v1/?recursive=true": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Any ideas for things to look into would be much appreciated

OurFriendIrony commented 1 year ago

Issue is not in the operator, but in the implementation method used. For openshift, the operator needs to be installed using their Subscription method, as opposed to via helm. This resulted in a clean deployment