DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.88k stars 1.21k forks source link

DataDog Agent pods are going into CrashLoopBackOff State. #18157

Open Lalithya-1211 opened 1 year ago

Lalithya-1211 commented 1 year ago

Agent Environment

Datadog agent agent version-3.7.3 Agent Status:- NAME READY STATUS RESTARTS AGE datadog-26qnn 2/3 CrashLoopBackOff 256 (119s ago) 21h datadog-c5brf 2/3 CrashLoopBackOff 256 (3m17s ago) 21h datadog-cluster-agent-6c6b9dc8d8-g89lv 1/1 Running 0 21h datadog-n82bg 2/3 CrashLoopBackOff 256 (53s ago) 21h datadog-rfwtx 2/3 CrashLoopBackOff 256 (2m22s ago) 21h datadog-rxr8w 2/3 CrashLoopBackOff 256 (104s ago) 21h Agent Logs:- 2023-07-13 07:16:43 UTC | CORE | WARN | (pkg/util/log/log.go:618 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec 2023-07-13 07:16:43 UTC | CORE | INFO | (pkg/util/log/log.go:590 in func1) | 3 Features detected from environment: kubernetes,cri,containerd 2023-07-13 07:16:43 UTC | CORE | ERROR | (comp/core/log/logger.go:100 in Errorf) | Dogstatsd: unable to determine default hostname: unable to reliably determine the host name. You can define one in the agent config file or in your hosts file 2023-07-13 07:16:43 UTC | CORE | INFO | (comp/forwarder/defaultforwarder/default_forwarder.go:238 in NewDefaultForwarder) | Retry queue storage on disk is disabled 2023-07-13 07:16:43 UTC | CORE | INFO | (pkg/runtime/runtime.go:27 in func1) | runtime: final GOMAXPROCS value is: 4 2023-07-13 07:16:43 UTC | CORE | INFO | (cmd/agent/subcommands/run/command.go:317 in startAgent) | Starting Datadog Agent v7.46.0 2023-07-13 07:16:43 UTC | CORE | ERROR | (cmd/agent/subcommands/run/command.go:378 in startAgent) | Error while getting hostname, exiting: unable to reliably determine the host name. You can define one in the agent config file or in your hosts file 2023-07-13 07:16:43 UTC | CORE | INFO | (pkg/logs/logs.go:107 in Stop) | Stopping logs-agent 2023-07-13 07:16:43 UTC | CORE | INFO | (pkg/logs/logs.go:116 in Stop) | logs-agent stopped 2023-07-13 07:16:43 UTC | CORE | INFO | (cmd/agent/subcommands/run/command.go:584 in stopAgent) | See ya! Error: Error while getting hostname, exiting: unable to reliably determine the host name. You can define one in the agent config file or in your hosts file Process agent logs :- 2023-07-12 12:57:45 UTC | PROCESS | ERROR | (pkg/process/runner/runner.go:150 in runCheck) | Unable to run check 'pod': temporary failure in kubeutil, will retry later: impossible to reach Kubelet with host: 172.16.208.5. Please check if your setup requires kubelet_tls_verify = false. Activate debug logs to see all attempts made 2023-07-12 12:57:45 UTC | PROCESS | ERROR | (comp/forwarder/defaultforwarder/worker.go:195 in process) | Error while processing transaction: error while sending transaction, rescheduling it: Post "https://process.https/api/v1/discovery": dial tcp: lookup process.https: no such host 2023-07-12 12:57:46 UTC | PROCESS | ERROR | (comp/forwarder/defaultforwarder/default_forwarder.go:680 in func3) | timed out waiting for responses, received 0/1 2023-07-12 12:57:46 UTC | PROCESS | ERROR | (comp/forwarder/defaultforwarder/worker.go:191 in process) | Too many errors for endpoint 'https://process.https:/api/v1/container': retrying later

Describe what happened: While executing this command the pods are not up.. its going to crashloopBackOff

"helm install datadog --set datadog.site='datadoghq.com' --set datadog.apiKey=7102**** datadog/datadog".

Describe what you expected: The pods should be up and running.

Steps to reproduce the issue: "helm install datadog --set datadog.site='datadoghq.com' --set datadog.apiKey=7102**** datadog/datadog"

Additional environment details (Operating System, Cloud provider, etc): OS- centos Cloud - Azure

carlosroman commented 1 year ago

Hi, looks like the Agent is not able to determine the hostname (similar to #14152):

2023-07-13 07:16:43 UTC | CORE | ERROR | (comp/core/log/logger.go:100 in Errorf) | Dogstatsd: unable to determine default hostname: unable to reliably determine the host name. You can define one in the agent config file or in your hosts file

I'd look into this troubleshooting guide here and see if that helps fix your issue.