NVIDIA / dcgm-exporter

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Apache License 2.0
923 stars 159 forks source link

Compiled locally, server runs, fails #414

Closed basi-a closed 2 weeks ago

basi-a commented 2 weeks ago

Ask your question

I compile the DCGM locally, and dcgm-exporter, and then throw the binary executable directly to the GPU server to run.

They are all physical machines, not containers, and I don't have an Nvidia card locally

/usr/bin/nv-hostengine --service-account nvidia-dcgm
./dcgm-exporter: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by ./dcgm-exporter)
./dcgm-exporter: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./dcgm-exporter) 

DCGM runs fine, but dcgm-exporter has glibc errors. Out of caution I cannot install the missing version on the running production server.

Moreover, directly compiling into a statically linked binary file will always get stuck in the startup process, but can set the host:port of nv-hostengine with dcgm-exporter -r

ERRO[0000] Encountered a failure.                        stacktrace="goroutine 1 [running]:\nruntime/debug.Stack()\n\t/usr/lib/go/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/NVIDIA/dcgm-exporter/pkg/cmd.action.func1.1()\n\t/home/yang/dcgm-exporter/pkg/cmd/app.go:283 +0x3d\npanic({0x19a41c0?, 0x2beb810?})\n\t/usr/lib/go/src/runtime/panic.go:785 +0x132\ngithub.com/NVIDIA/dcgm-exporter/pkg/cmd.initDCGM(0xc0005856c0)\n\t/home/yang/dcgm-exporter/pkg/cmd/app.go:510 +0x1e0\ngithub.com/NVIDIA/dcgm-exporter/pkg/cmd.startDCGMExporter(0xc0001b0640, 0xc000303970)\n\t/home/yang/dcgm-exporter/pkg/cmd/app.go:303 +0xa5\ngithub.com/NVIDIA/dcgm-exporter/pkg/cmd.action.func1()\n\t/home/yang/dcgm-exporter/pkg/cmd/app.go:287 +0x55\ngithub.com/NVIDIA/dcgm-exporter/pkg/stdout.Capture({0x1ec5e30, 0xc0001b94a0}, 0xc000519ba0)\n\t/home/yang/dcgm-exporter/pkg/stdout/capture.go:77 +0x1df\ngithub.com/NVIDIA/dcgm-exporter/pkg/cmd.action(0xc0001b0640)\n\t/home/yang/dcgm-exporter/pkg/cmd/app.go:278 +0x5b\ngithub.com/NVIDIA/dcgm-exporter/pkg/cmd.NewApp.func1(0xc0001b0640?)\n\t/home/yang/dcgm-exporter/pkg/cmd/app.go:263 +0x13\ngithub.com/urfave/cli/v2.(*Command).Run(0xc0001bac60, 0xc0001b0640, {0xc0000400a0, 0x5, 0x5})\n\t/home/yang/go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/command.go:279 +0x7e2\ngithub.com/urfave/cli/v2.(*App).RunContext(0xc000043600, {0x1ec5d18, 0x2cb6f40}, {0xc0000400a0, 0x5, 0x5})\n\t/home/yang/go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/app.go:337 +0x58b\ngithub.com/urfave/cli/v2.(*App).Run(0xc000519f30?, {0xc0000400a0?, 0x1?, 0x16ffb70?})\n\t/home/yang/go/pkg/mod/github.com/urfave/cli/v2@v2.27.1/app.go:311 +0x2f\nmain.main()\n\t/home/yang/dcgm-exporter/cmd/dcgm-exporter/main.go:35 +0x5f\n"   

How can I achieve my goal? please