NVIDIA / dcgm-exporter

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Apache License 2.0
923 stars 159 forks source link

dcgm-exporter crashes when run on Debian 12 #360

Closed stevenmcastano closed 4 months ago

stevenmcastano commented 4 months ago

What is the version?

3.3.6-3.4.2

What happened?

Installed on Debian 12 after compiling from source. When I run dcgm-exporter I get the following error message:

2024/07/15 22:43:41 maxprocs: Leaving GOMAXPROCS=8: CPU quota undefined
INFO[0000] Starting dcgm-exporter
ERRO[0000] Encountered a failure.                        stacktrace="goroutine 1 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x5e\ngithub.com/NVIDIA/dcgm-exporter/pkg/cmd.action.func1.1()\n\t/root/dcgm-exporter/pkg/cmd/app.go:283 +0x3d\npanic({0x182f360?, 0x29993c0?})\n\t/usr/local/go/src/runtime/panic.go:770 +0x132\ngithub.com/NVIDIA/dcgm-exporter/pkg/cmd.initDCGM(0xc00028bc00)\n\t/root/dcgm-exporter/pkg/cmd/app.go:523 +0x9b\ngithub.com/NVIDIA/dcgm-exporter/pkg/cmd.startDCGMExporter(0xc000114f40, 0xc0001a6ea0)\n\t/root/dcgm-exporter/pkg/cmd/app.go:303 +0xab\ngithub.com/NVIDIA/dcgm-exporter/pkg/cmd.action.func1()\n\t/root/dcgm-exporter/pkg/cmd/app.go:287 +0x5b\ngithub.com/NVIDIA/dcgm-exporter/pkg/stdout.Capture({0x1d279b0, 0xc0000307d0}, 0xc0003e3b90)\n\t/root/dcgm-exporter/pkg/stdout/capture.go:77 +0x1e6\ngithub.com/NVIDIA/dcgm-exporter/pkg/cmd.action(0xc000114f40)\n\t/root/dcgm-exporter/pkg/cmd/app.go:278 +0x67\ngithub.com/NVIDIA/dcgm-exporter/pkg/cmd.NewApp.func1(0xc000114f40?)\n\t/root/dcgm-exporter/pkg/cmd/app.go:263 +0x13\ngithub.com/urfave/cli/v2.(*Command).Run(0xc00014c000, 0xc000114f40, {0xc000040150, 0x1, 0x1})\n\t/root/go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/command.go:279 +0x97d\ngithub.com/urfave/cli/v2.(*App).RunContext(0xc0001e1200, {0x1d27898, 0x2a9aa40}, {0xc000040150, 0x1, 0x1})\n\t/root/go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/app.go:337 +0x58b\ngithub.com/urfave/cli/v2.(*App).Run(0xc0003e3f30?, {0xc000040150?, 0x1?, 0x1665090?})\n\t/root/go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/app.go:311 +0x2f\nmain.main()\n\t/root/dcgm-exporter/cmd/dcgm-exporter/main.go:35 +0x5f\n"

What did you expect to happen?

I would thing the exporter should start

What is the GPU model?

2080ti

What is the environment?

Debian 12 running under Proxmox 8.2

How did you deploy the dcgm-exporter and what is the configuration?

Built from source

How to reproduce the issue?

1) clone the repo 2) make binary 3) make install 4) run dcgm-exporter

Anything else we need to know?

No response

stevenmcastano commented 4 months ago

Closing this... my mistake. I didn't realize the dcgm install had failed. Once I corrected that the exporter is starting as expected!