coroot / coroot-node-agent

A Prometheus exporter based on eBPF that gathers comprehensive container metrics
https://coroot.com/docs/metrics/node-agent
Apache License 2.0
311 stars 55 forks source link

`proc.ReadFds()` hangs #55

Closed keisku closed 7 months ago

keisku commented 8 months ago

Description

This function hangs. Especially, dest, err := os.Readlink(path.Join(fdDir, entry.Name()))

https://github.com/coroot/coroot-node-agent/blob/8e1fa825ad97ce88d587e8991cd8357c19f90dd4/proc/fd.go#L17-L40

Reproduction

$ git log -1
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
commit 8e1fa825ad97ce88d587e8991cd8357c19f90dd4 (HEAD -> main, origin/main, origin/HEAD, dd-trace)
Author: Nikolay Sivko <n.sivko@gmail.com>
Date:   Wed Dec 20 17:19:07 2023 +0300

    CRI-O: fix container log discovery

$ uname -a
Linux ip-10-0-133-150 6.2.0-1017-aws #17~22.04.1-Ubuntu SMP Fri Nov 17 21:07:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ docker version
Client: Docker Engine - Community
 Version:           24.0.7
 API version:       1.43
 Go version:        go1.20.10
 Git commit:        afdd53b
 Built:             Thu Oct 26 09:07:41 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.7
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.10
  Git commit:       311b9ff
  Built:            Thu Oct 26 09:07:41 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.26
  GitCommit:        3dd1e886e55dd695541fdcd67420c2888645a495
 runc:
  Version:          1.1.10
  GitCommit:        v1.1.10-0-g18a0cb0
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

$ pwd
/home/ubuntu/workspace/coroot-node-agent

$ docker build . -t coroot-node-agent-dev

[+] Building 58.9s (18/18) FINISHED                                                                                                                                                                                                                                                                                                                                                                                                                                        docker:default
 => [internal] load .dockerignore                                                                                                                                                                                                                                                                                                                                                                                                                                                    0.0s
 => => transferring context: 59B                                                                                                                                                                                                                                                                                                                                                                                                                                                     0.0s
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                                                                                                                                                                                                                                 0.0s
 => => transferring dockerfile: 553B                                                                                                                                                                                                                                                                                                                                                                                                                                                 0.0s
 => [internal] load metadata for docker.io/library/debian:bullseye                                                                                                                                                                                                                                                                                                                                                                                                                   0.6s
 => [internal] load metadata for docker.io/library/golang:1.19-bullseye                                                                                                                                                                                                                                                                                                                                                                                                              0.6s
 => [builder 1/9] FROM docker.io/library/golang:1.19-bullseye@sha256:2fdfcb03b1445f06f1cf8a342516bfd34026b527fef8427f40ea7b140168fda2                                                                                                                                                                                                                                                                                                                                                0.0s
 => [stage-1 1/3] FROM docker.io/library/debian:bullseye@sha256:71f0e09d55a4042ddee1f114a0838d99266e185bf33e71f15c15bf6b9545a9a0                                                                                                                                                                                                                                                                                                                                                     0.0s
 => [internal] load build context                                                                                                                                                                                                                                                                                                                                                                                                                                                    0.0s
 => => transferring context: 22.23kB                                                                                                                                                                                                                                                                                                                                                                                                                                                 0.0s
 => CACHED [builder 2/9] RUN apt update && apt install -y libsystemd-dev                                                                                                                                                                                                                                                                                                                                                                                                             0.0s
 => CACHED [builder 3/9] COPY go.mod /tmp/src/                                                                                                                                                                                                                                                                                                                                                                                                                                       0.0s
 => CACHED [builder 4/9] COPY go.sum /tmp/src/                                                                                                                                                                                                                                                                                                                                                                                                                                       0.0s
 => CACHED [builder 5/9] WORKDIR /tmp/src/                                                                                                                                                                                                                                                                                                                                                                                                                                           0.0s
 => CACHED [builder 6/9] RUN go mod download                                                                                                                                                                                                                                                                                                                                                                                                                                         0.0s
 => [builder 7/9] COPY . /tmp/src/                                                                                                                                                                                                                                                                                                                                                                                                                                                   0.1s
 => [builder 8/9] RUN CGO_ENABLED=1 go test ./...                                                                                                                                                                                                                                                                                                                                                                                                                                   51.7s
 => [builder 9/9] RUN CGO_ENABLED=1 go install -mod=readonly -ldflags "-X main.version=unknown" /tmp/src                                                                                                                                                                                                                                                                                                                                                                             5.7s
 => CACHED [stage-1 2/3] RUN apt update && apt install -y ca-certificates && apt clean                                                                                                                                                                                                                                                                                                                                                                                               0.0s
 => [stage-1 3/3] COPY --from=builder /go/bin/coroot-node-agent /usr/bin/coroot-node-agent                                                                                                                                                                                                                                                                                                                                                                                           0.2s
 => exporting to image                                                                                                                                                                                                                                                                                                                                                                                                                                                               0.3s
[docker.log](https://github.com/coroot/coroot-node-agent/files/13790552/docker.log)

 => => exporting layers                                                                                                                                                                                                                                                                                                                                                                                                                                                              0.2s
 => => writing image sha256:52fd0dd6da8116dae22bee78bb8c62f24917e5332f5d3ec880b0b68a2fc35f27                                                                                                                                                                                                                                                                                                                                                                                         0.0s
 => => naming to docker.io/library/coroot-node-agent-dev                                                                                                                                                                                                                                                                                                                                                                                                                             0.0s

$ docker run --detach --name coroot-node-agent-dev --privileged --pid host -p 8080:80 -v /sys/kernel/debug:/sys/kernel/debug:rw -v /sys/fs/cgroup:/host/sys/fs/cgroup:ro coroot-node-agent-dev --cgroupfs-root=/host/sys/fs/cgroup

I've inserted additional trace logs to precisely identify the code segment responsible for this issue. Also followed this doc.

See the result of git diff.

trace-log.patch

Logs

See the attachment, docker logs coroot-node-agent-dev result.

docker.log

def commented 7 months ago

Thanks, @keisku, for the detailed report. The agent uses a rate limiter for logging, which may create the impression of it hanging. Other than the log, what other problems do you see with the agent?

keisku commented 7 months ago

@def Thanks! I overlooked the rate limit.

Other than the log, what other problems do you see with the agent?

curl failed for these endpoints and I didn't see these logs. Then I thought some operation hanged.

https://github.com/coroot/coroot-node-agent/blob/8e1fa825ad97ce88d587e8991cd8357c19f90dd4/main.go#L143-L145

But the actual problem was that I didn't set --port for docker run. So, my issue has been solved 👍

Btw, why do we need the rate limit?

def commented 7 months ago

We added the rate limit to prevent situations, such as those described in #17

keisku commented 7 months ago

@def

Can we make the rate limit configurable with flags?

See https://github.com/coroot/coroot-node-agent/pull/56

def commented 7 months ago

@keisku, sure, I merged your PR