Open llyons opened 1 year ago
Hello, Thanks for submitting the issue. Few question which would help us debug the issue:
default datadog-5dsrp 3/4 CrashLoopBackOff 6 (77s ago) 7m21s
it looks one container isn't running, could you describe the pod, identify container which is not running and share logs from that container (CRITICAL
and ERROR
at the very least), kubectl logs datadog-5dsrp -c system-probe
.us3
since you are using site: us3.datadoghq.com
.The api key and app key are both in us3.datadoghq.com
We do have some data showing up in us3 however 2 of the pods are not running.
default datadog-cluster-agent-586d86b7d6-f5252 1/1 Running 0 66m default datadog-frc6x 3/4 CrashLoopBackOff 4 (25s ago) 2m4s default datadog-kube-state-metrics-5c77dcd6d5-97gvq 1/1 Running 0 66m default datadog-tlnq4 3/4 CrashLoopBackOff 4 (31s ago) 2m4s
get logs for agent I see no errors.
kubectl logs datadog-frc6x -c agent ---> no errors kubectl logs datadog-tlnq4 -c agent ---> no errors
the system-probe logs shows some errors.
kubectl logs datadog-frc6x -g system-probe
faccessat2 seems blocked by the seccomp profile of an old version of docker.
clone3 seems blocked by the seccomp profile of an old version of docker.
load a seccomp profile to force ENOSYS.
2023-06-27 15:42:11 UTC | SYS-PROBE | WARN | (pkg/util/log/log.go:618 in func1) | Unknown key in config file: runtime_security_config.syscall_monitor.enabled
2023-06-27 15:42:11 UTC | SYS-PROBE | WARN | (pkg/util/log/log.go:618 in func1) | Unknown key in config file: runtime_security_config.activity_dump.cgroup_wait_list_size
2023-06-27 15:42:11 UTC | SYS-PROBE | WARN | (pkg/util/log/log.go:618 in func1) | Unknown key in config file: runtime_security_config.network.enabled
2023-06-27 15:42:11 UTC | SYS-PROBE | WARN | (pkg/util/log/log.go:618 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
2023-06-27 15:42:11 UTC | SYS-PROBE | INFO | (pkg/config/environment_detection.go:123 in detectFeatures) | 3 Features detected from environment: kubernetes,cri,containerd
2023-06-27 15:42:11 UTC | SYS-PROBE | INFO | (pkg/runtime/runtime.go:27 in func1) | runtime: final GOMAXPROCS value is: 4
2023-06-27 15:42:11 UTC | SYS-PROBE | INFO | (comp/core/log/logger.go:87 in Infof) | starting system-probe v7.45.0
2023-06-27 15:42:11 UTC | SYS-PROBE | INFO | (pkg/network/tracer/utils_linux.go:34 in IsTracerSupportedByOS) | running on platform: centos
2023-06-27 15:42:11 UTC | SYS-PROBE | INFO | (cmd/system-probe/modules/network_tracer.go:60 in func3) | enabling universal service monitoring (USM)
2023-06-27 15:42:11 UTC | SYS-PROBE | INFO | (pkg/network/tracer/tracer.go:126 in newTracer) | detected kernel version 3.10.0, will use kprobes from kernel version < 4.1.0
2023-06-27 15:42:11 UTC | SYS-PROBE | ERROR | (cmd/system-probe/api/module/loader.go:65 in Register) | error creating module network_tracer: Universal Service Monitoring (USM) requires a Linux kernel version of 4.14.0 or higher. We detected 3.10.0
2023-06-27 15:42:11 UTC | SYS-PROBE | INFO | (cmd/system-probe/api/module/loader.go:55 in Register) | module tcp_queue_length_tracer disabled
2023-06-27 15:42:11 UTC | SYS-PROBE | INFO | (cmd/system-probe/api/module/loader.go:55 in Register) | module oom_kill_probe disabled
2023-06-27 15:42:11 UTC | SYS-PROBE | INFO | (cmd/system-probe/api/module/loader.go:55 in Register) | module event_monitor disabled
2023-06-27 15:42:11 UTC | SYS-PROBE | INFO | (cmd/system-probe/api/module/loader.go:55 in Register) | module process disabled
2023-06-27 15:42:11 UTC | SYS-PROBE | INFO | (cmd/system-probe/api/module/loader.go:55 in Register) | module dynamic_instrumentation disabled
2023-06-27 15:42:11 UTC | SYS-PROBE | CRITICAL | (comp/core/log/logger.go:108 in Criticalf) | error while starting api server, exiting: failed to create system probe: no module could be loaded
Error: error while starting api server, exiting: failed to create system probe: no module could be loaded
BTW, here are the exact versions of Centos we have.
Operating System: CentOS Linux 7 (Core) CPE OS Name: cpe:/o:centos:centos:7 Kernel: Linux 3.10.0-1160.90.1.el7.x86_64 Architecture: x86-64
It looks like I might need to downgrade.... Can you help me with what or how I might need to change our values.yam to make this happen?
thanks
i tried to add this into the values.yaml and it didnt change the agent version. I am not sure which agent version will work anyways but still here trying.
clusterAgent:
enabled: true
image:
name: cluster-agent
tag: 7.21.1
so I was told that I should disable ServiceMonitoring in values.yaml
serviceMonitoring: enabled: false
and now the pods are all running
not sure what we lose by turning this off.
Hello, Sorry for the delay in responding on the issue. I suppose you already answered your question based on above findings.
Universal Service Monitoring/USM controlled by serviceMonitoring.enabled
property isn't compatible with your current environment running Linux Kernel 3.10.0/CentOS Linux 7.
These are prerequisites for from the USM doc
Your service must be running on one of the following supported platforms
Linux Kernel 4.14 and greater
CentOS or RHEL 8.0 and greater
Hence the error log:
2023-06-27 15:42:11 UTC | SYS-PROBE | ERROR | (cmd/system-probe/api/module/loader.go:65 in Register) | error creating module network_tracer: Universal Service Monitoring (USM) requires a Linux kernel version of 4.14.0 or higher. We detected 3.10.0
Regarding what you lose, this doc provides a good overview of USM. In a nutshell, with USM you gain visibility into you stacks without instrumenting code.
Please let me know if you have any questions.
Describe what happened:
In a trial with Datadog to determine if we should move forward with datadog as a monitoring solution. Trying to install the datadog kubernetes agent on a on premise k8s custer.
we are running 1.27.3 k8s on Centos 7 linux machines.
we have tried the manifest datadog operator approach and the helm chart.
Focusing on using the helm chart approach
our values.yaml file is this.
we setup a datadog secret as follows.
kubectl create secret generic datadog-secret --from-literal api-key=5d299a1b5a9e758e0b3.......... --from-literal app-key=e43c096eed3adfa18...........
we executed the helm install like this.
helm install datadog -f values.yaml --set datadog.apiKey=5d299a1b5a9e758e0b3............. datadog/datadog --set targetSystem=linux
kubect get po -A
default datadog-5dsrp 3/4 CrashLoopBackOff 6 (77s ago) 7m21s default datadog-cluster-agent-586d86b7d6-f5252 1/1 Running 0 9m16s default datadog-hx7qb 3/4 CrashLoopBackOff 6 (3m21s ago) 9m16s default datadog-kube-state-metrics-5c77dcd6d5-97gvq 1/1 Running 0 9m16s
We are getting a number of errors with none of the agents coming up ( Sorry the logs generated from doing k logs datadog-2rgn9 -c agent are very large )
Describe what you expected:
Expected all the agents to start up and run. Here is the output of the helm install
Steps to reproduce the issue:
Additional environment details (Operating System, Cloud provider, etc):
on prem k8s cluster
Centos 7 machines kubernetes 1.27.3 1 control plane, 2 linux workers