coroot / coroot-node-agent

A Prometheus exporter based on eBPF that gathers comprehensive container metrics
https://coroot.com/docs/metrics/node-agent
Apache License 2.0
314 stars 56 forks source link

Only containerD is shown in application #5

Open sumeet-zuora opened 1 year ago

sumeet-zuora commented 1 year ago

As per documents and installation, after installing coroot and agent, prometheus was attached properly and only visible application was containerD any help appreciated

apetruhin commented 1 year ago

@Schaudhari7565, please attach logs of the agent

sumeet-zuora commented 1 year ago

corootnodeagent-n4cbb.txt Attached is the logs from one of the agent

sumeet-zuora commented 1 year ago

manifest for agent

---
# Source: corootnodeagent/templates/daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  namespace: coroot
  labels:
    chart: "corootnodeagent-1.0.0"
    release: "corootnodeagent"
    heritage: "Helm"
  name: corootnodeagent
spec:
  selector:
    matchLabels:
      app: corootnodeagent
      group: observability
      provider: tools
  template:
    metadata:
      annotations:
        prometheus.io/port: "80"
        prometheus.io/scrape: "true"
      labels:
        app: corootnodeagent
        group: observability
        provider: tools
    spec:
      imagePullSecrets:
        - name: regcred
      tolerations:
        - operator: Exists
      hostPID: true
      containers:
        - name: corootnodeagent
          image: "ghcr.io/coroot/coroot-node-agent:latest"
          imagePullPolicy: "IfNotPresent"
          args: ["--cgroupfs-root", "/host/sys/fs/cgroup"]
          ports:
            - name: http
              containerPort: 80
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /host/sys/fs/cgroup
              name: cgroupfs
              readOnly: true
            - mountPath: /sys/kernel/debug
              name: debugfs
              readOnly: false
      volumes:
        - hostPath:
            path: /sys/fs/cgroup
          name: cgroupfs
        - hostPath:
            path: /sys/kernel/debug
          name: debugfs
sumeet-zuora commented 1 year ago

also, I am using VictoriaMetrics instead of Prometheus .. not sure if this breaks but connection did work as expected

apetruhin commented 1 year ago

At first glance, nothing unusual. Please show me how it looks in Coroot: main page and settings page of the project.

apetruhin commented 1 year ago

Also, Coroot logs would help.

sumeet-zuora commented 1 year ago

ahh.. I was missing the kube-state-metrics, seems like a progress no more logs other than compaction .. does it take some time? for UI to show up services

W0929 19:03:37.575715       1 containers.go:65] unknown pod: kube-system/cilium-bs7md, seems like no kube-state-metrics installed
W0929 19:03:37.575736       1 containers.go:65] unknown pod: coroot/corootnodeagent-cwk82, seems like no kube-state-metrics installed
W0929 19:03:37.576552       1 containers.go:65] unknown pod: pomerium/pomerium-proxy-587b77dd7c-zj899, seems like no kube-state-metrics installed
W0929 19:03:37.576582       1 containers.go:65] unknown pod: pomerium/pomerium-authenticate-6f5c68ff6b-p4vzb, seems like no kube-state-metrics installed
W0929 19:03:37.576603       1 containers.go:65] unknown pod: vertical-pod-autoscaler-ecc/vertical-pod-autoscaler-updater-f6c6c88d6-tq648, seems like no kube-state-metrics installed
W0929 19:03:37.577216       1 containers.go:65] unknown pod: kong-internal/kong-kong-internal-948b64c4b-26zzp, seems like no kube-state-metrics installed
W0929 19:03:37.577245       1 containers.go:65] unknown pod: vertical-pod-autoscaler/vertical-pod-autoscaler-recommender-577b8847df-nc84r, seems like no kube-state-metrics installed
W0929 19:03:37.577644       1 containers.go:65] unknown pod: kube-system/cilium-j8stt, seems like no kube-state-metrics installed
W0929 19:03:37.577675       1 containers.go:65] unknown pod: logging/elasticsearch-es-client-1, seems like no kube-state-metrics installed
W0929 19:03:37.577696       1 containers.go:65] unknown pod: zodiac/zookeeper-0, seems like no kube-state-metrics installed
W0929 19:03:37.578267       1 containers.go:65] unknown pod: logging/elasticsearch-es-client-1, seems like no kube-state-metrics installed
W0929 19:03:37.578308       1 containers.go:65] unknown pod: kube-system/cilium-ldzf5, seems like no kube-state-metrics installed
W0929 19:03:37.578330       1 containers.go:65] unknown pod: zodiac/elastic-master-2, seems like no kube-state-metrics installed
W0929 19:03:37.578518       1 containers.go:65] unknown pod: kube-system/cilium-bs7md, seems like no kube-state-metrics installed
I0929 19:03:37.584400       1 constructor.go:64] got 13 nodes, 1500 services, 1390 applications
I0929 19:03:39.063600       1 compaction.go:92] compaction iteration started
I0929 19:03:49.064250       1 compaction.go:92] compaction iteration started
I0929 19:03:57.011050       1 updater.go:53] worker iteration for 2tt6kt9l
I0929 19:03:59.158375       1 compaction.go:92] compaction iteration started
I0929 19:04:09.064052       1 compaction.go:92] compaction iteration started
I0929 19:04:19.064213       1 compaction.go:92] compaction iteration started
sumeet-zuora commented 1 year ago

Still same, after almost 15 minutes .. only containerD is visible

image
apetruhin commented 1 year ago

It can take some time (depending on the cluster size) to cache-updater download metrics of the kube-state-metrics for the first time. Do you have more lines like this in Coroot logs?

I0929 19:03:57.011050       1 updater.go:53] worker iteration for 2tt6kt9l

Or maybe some errors?

sumeet-zuora commented 1 year ago

Still nothing, no errors during startup .. only messages i see are

I0930 07:03:55.449972       1 main.go:29] version: 0.4.0
I0930 07:03:55.450088       1 db.go:39] using sqlite database
I0930 07:03:55.795158       1 cache.go:130] cache loaded from disk in 339.678568ms
I0930 07:03:55.795491       1 compaction.go:81] compaction worker started
I0930 07:03:55.795534       1 main.go:77] listening on 0.0.0.0:8080
I0930 07:03:56.796094       1 updater.go:53] worker iteration for 2tt6kt9l
I0930 07:04:05.795815       1 compaction.go:92] compaction iteration started
I0930 08:15:05.809959       1 compaction.go:155] compaction task 3c4b3c56d9bf3ed9c6fb8ca80b6e51d3 [1664511240,1664514840,1664518440,1664522040]:3600 -> 1664511240:14400 done in 12.387516ms
I0930 08:15:05.811276       1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664511240-120-30.db
I0930 08:15:05.811322       1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664514840-120-30.db
I0930 08:15:05.811344       1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664518440-120-30.db
I0930 08:15:05.811370       1 compaction.go:144] deleting chunk after compaction: /data/cache/2tt6kt9l/2tt6kt9l-ad52fcad143b8b1451800115bbe853fe-1664522040-120-30.db
I0930 08:15:05.811410       1 compaction.go:155] compaction task ad52fcad143b8b1451800115bbe853fe [1664511240,1664514840,1664518440,1664522040]:3600 -> 1664511240:14400 done in 1.410773ms
I0930 08:15:15.795574       1 compaction.go:92] compaction iteration started
I0930 08:15:25.796272       1 compaction.go:92] compaction iteration started
I0930 08:15:26.200839       1 updater.go:53] worker iteration for 2tt6kt9l
apetruhin commented 1 year ago
sumeet-zuora commented 1 year ago
image

so, I did found that, I was not scraping metrics of kube-state-metrics from where the coroot cluster was running, but with adding annotations I got the metrics

image
kube_pod_info{app_kubernetes_io_component="metrics", app_kubernetes_io_instance="kube-state-metrics", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_part_of="kube-state-metrics", app_kubernetes_io_version="2.6.0", container="kube-state-metrics", created_by_kind="DaemonSet", created_by_name="aws-node-termination-handler", datacenter="eks-12-ecc-xxxx-xxxxx", exported_namespace="aws-node-termination-handler", exported_node="ip-10-124-128-97.us-west-2.compute.internal", exported_pod="aws-node-termination-handler-4tzqm", helm_sh_chart="kube-state-metrics-4.20.1", host_ip="10.124.128.97", host_network="true", instance="10.8.30.247:8080", job="1", namespace="monitoring", node="ip-10-124-130-55.us-west-2.compute.internal", pod="kube-state-metrics-c6678766c-cbprt", pod_ip="10.124.128.97", pod_template_hash="c6678766c", priority_class="system-node-critical", uid="1385a8d7-9a21-4674-8dfa-b0cb50fe6b54"}
sumeet-zuora commented 1 year ago

Something new showed up and it keeps on changing, different applications are show automatically under monitoring

image

does it take time to build cache or something?

apetruhin commented 1 year ago

Coroot uses metrics gathered by kube-state-metrics to join containers into applications. So, this probably should fix the issue.

sumeet-zuora commented 1 year ago

So, after adding the annotations and can see the metrics in VM.. still it complains about some pods missing and suddenly it detects them.. seems like it is loosing connections

W0930 18:55:57.155919       1 containers.go:65] unknown pod: logging/elasticsearch-es-client-1, seems like no kube-state-metrics installed
W0930 18:55:57.155951       1 containers.go:65] unknown pod: keda/keda-operator-675b587d7b-xcls7, seems like no kube-state-metrics installed
W0930 18:55:57.156000       1 containers.go:65] unknown pod: kube-system/cilium-7lfxm, seems like no kube-state-metrics installed
W0930 18:55:57.156039       1 containers.go:65] unknown pod: kube-system/cilium-ns58n, seems like no kube-state-metrics installed
W0930 18:55:57.156068       1 containers.go:65] unknown pod: logging/elasticsearch-es-warm-0, seems like no kube-state-metrics installed
W0930 18:55:57.156106       1 containers.go:65] unknown pod: kube-system/cilium-q5gqr, seems like no kube-state-metrics installed
W0930 18:55:57.156140       1 containers.go:65] unknown pod: kube-system/cilium-c6kzq, seems like no kube-state-metrics installed
W0930 18:55:57.156176       1 containers.go:65] unknown pod: kube-system/cilium-operator-69c65bf5c6-mrz6b, seems like no kube-state-metrics installed
W0930 18:55:57.156257       1 containers.go:65] unknown pod: elastic-operator/elastic-operator-1, seems like no kube-state-metrics installed
W0930 18:55:57.156292       1 containers.go:65] unknown pod: kube-system/cilium-c6kzq, seems like no kube-state-metrics installed
W0930 18:55:57.156336       1 containers.go:65] unknown pod: kube-system/cilium-c6kzq, seems like no kube-state-metrics installed
W0930 18:55:57.156374       1 containers.go:65] unknown pod: kube-system/kube-proxy-dxkzf, seems like no kube-state-metrics installed
W0930 18:55:57.156413       1 containers.go:65] unknown pod: kube-system/cilium-68x2b, seems like no kube-state-metrics installed
I0930 18:55:57.163314       1 constructor.go:64] got 18 nodes, 1656 services, 1450 applications
2022/09/30 18:55:57 http: panic serving 127.0.0.1:50250: runtime error: invalid memory address or nil pointer dereference
goroutine 5986 [running]:
net/http.(*conn).serve.func1()
    /usr/local/go/src/net/http/server.go:1825 +0xbf
panic({0xac2f00, 0x1203280})
    /usr/local/go/src/runtime/panic.go:844 +0x258
github.com/coroot/coroot/api/views/overview.Render(0xc00029cbd0)
    /go/src/api/views/overview/overview.go:107 +0xb07
github.com/coroot/coroot/api/views.Overview(...)
    /go/src/api/views/views.go:20
github.com/coroot/coroot/api.(*Api).Overview(0xa92fa0?, {0xc5ccf0, 0xc006c16380}, 0xc000241e00?)
    /go/src/api/api.go:193 +0x91
net/http.HandlerFunc.ServeHTTP(0xc006c0d000?, {0xc5ccf0?, 0xc006c16380?}, 0x0?)
    /usr/local/go/src/net/http/server.go:2084 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0001c2240, {0xc5ccf0, 0xc006c16380}, 0xc006c0cc00)
    /go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1cf
net/http.serverHandler.ServeHTTP({0xc000775860?}, {0xc5ccf0, 0xc006c16380}, 0xc006c0cc00)
    /usr/local/go/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc000fbe460, {0xc5d398, 0xc0003dcd80})
    /usr/local/go/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
    /usr/local/go/src/net/http/server.go:3071 +0x4db
I0930 18:55:57.325649       1 compaction.go:92] compaction iteration started
I0930 18:55:58.325732       1 updater.go:53] worker iteration for 2tt6kt9l

in drop down I can see applications

image

but after selecting nothing is there

image

also UI is flaky.. the applications keep on changing

apetruhin commented 1 year ago

@Schaudhari7565, apologies for the delayed response. We have fixed the panic. Please update Coroot.

sumeet-zuora commented 1 year ago

I did update to latest 0.5.0 and still got panic is this due to large number of applications? wanted to know if we can restrict the applications or filter based on some labels .. like datacenter=eks16 to avoid reading all the metrics at same time

I1011 17:28:51.890568       1 constructor.go:68] got 46 nodes, 1557 services, 1484 applications
2022/10/11 17:28:52 http: panic serving 127.0.0.1:56478: runtime error: invalid memory address or nil pointer dereference
goroutine 20983 [running]:
net/http.(*conn).serve.func1()
    /usr/local/go/src/net/http/server.go:1825 +0xbf
panic({0xaf80e0, 0x1260280})
    /usr/local/go/src/runtime/panic.go:844 +0x258
github.com/coroot/coroot/auditor.(*appAuditor).cpu(0xc01207ebb8)
    /go/src/auditor/cpu.go:39 +0x4a2
github.com/coroot/coroot/auditor.Audit(0xc079e1e000)
    /go/src/auditor/auditor.go:26 +0x10a
github.com/coroot/coroot/api/views/overview.Render(0xc079e1e000)
    /go/src/api/views/overview/overview.go:40 +0x9d
github.com/coroot/coroot/api/views.Overview(...)
    /go/src/api/views/views.go:20
github.com/coroot/coroot/api.(*Api).Overview(0xc079e00120?, {0xc9f470, 0xc079e0c000}, 0xc0002c3680?)
    /go/src/api/api.go:194 +0x91
net/http.HandlerFunc.ServeHTTP(0xc079e1a000?, {0xc9f470?, 0xc079e0c000?}, 0xc0c526a9c0?)
    /usr/local/go/src/net/http/server.go:2084 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc000242000, {0xc9f470, 0xc079e0c000}, 0xc06c530000)
    /go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1cf
net/http.serverHandler.ServeHTTP({0xc06c512ea0?}, {0xc9f470, 0xc079e0c000}, 0xc06c530000)
    /usr/local/go/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc06c528000, {0xc9fb18, 0xc00013d9b0})
    /usr/local/go/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
    /usr/local/go/src/net/http/server.go:3071 +0x4db
I1011 17:28:55.345169       1 compaction.go:92] compaction iteration started
I1011 17:29:05.344864       1 compaction.go:92] compaction iteration started
I1011 17:29:15.345428       1 compaction.go:92] compaction iteration started
I1011 17:29:25.345135       1 compaction.go:92] compaction iteration started
^C
apetruhin commented 1 year ago

It is a new bug. We will fix it soon. Meanwhile, please install version 0.4.1

sumeet-zuora commented 1 year ago

Scaled down to 0.4.1, will monitor the logs

apetruhin commented 1 year ago

@Schaudhari7565, we've fixed the panic bug. Please upgrade Coroot to version >=0.5.1