Open Godson-A opened 5 months ago
It is not bug. You need to check your GO environment settings in the ansible script.
@nvvfedorov https://github.com/NVIDIA/dcgm-exporter/issues/321 same issue, problem is that he is building it on a node where is not GPU I think when you check that issue, I had same error log, it is still not mentioned in README.md
that you need GPU card to build from source
@jz543fm , I don't think that the issue is in the absence of the GPU. I suspect, that the datacenter-gpu-manager is missing on the build machine.
The GPU is necessary for running tests.
What is the version?
3.3.5-3.4.1
What happened?
I installed Dcgm exporter via terminal in one of my GPU node from source code. Initially during the installation following the documentation it was not successful. It failed in the
make binary
step. So I ran the following command from the Makefilecd cmd/dcgm-exporter
sudo go build -v -ldflags "-X main.BuildVersion=3.3.5-3.4.1"
After this the output is as follows
go: downloading github.com/sirupsen/logrus v1.9.3 go: downloading go.uber.org/automaxprocs v1.5.3 go: downloading github.com/urfave/cli/v2 v2.27.1 go: downloading github.com/NVIDIA/go-dcgm v0.0.0-20240118201113-3385e277e49f go: downloading github.com/stretchr/testify v1.8.4 go: downloading github.com/bits-and-blooms/bitset v1.13.0 go: downloading github.com/gorilla/mux v1.8.1 go: downloading github.com/prometheus/exporter-toolkit v0.11.0 go: downloading golang.org/x/sync v0.5.0 go: downloading google.golang.org/grpc v1.61.1 go: downloading k8s.io/api v0.29.2 go: downloading k8s.io/apimachinery v0.29.2 go: downloading k8s.io/client-go v0.29.2 go: downloading k8s.io/kubelet v0.29.2 go: downloading github.com/NVIDIA/go-nvml v0.12.0-2 go: downloading github.com/go-kit/log v0.2.1 go: downloading golang.org/x/sys v0.16.0 go: downloading github.com/coreos/go-systemd/v22 v22.5.0 go: downloading github.com/prometheus/common v0.47.0 go: downloading golang.org/x/crypto v0.18.0 go: downloading gopkg.in/yaml.v2 v2.4.0 go: downloading github.com/davecgh/go-spew v1.1.1 go: downloading github.com/pmezard/go-difflib v1.0.0 go: downloading gopkg.in/yaml.v3 v3.0.1 go: downloading github.com/Masterminds/semver v1.5.0 go: downloading github.com/go-logfmt/logfmt v0.6.0 go: downloading github.com/gogo/protobuf v1.3.2 go: downloading github.com/mwitkow/go-conntrack v0.0.0-20190716064945-2f068394615f go: downloading golang.org/x/net v0.20.0 go: downloading golang.org/x/oauth2 v0.16.0 go: downloading github.com/cpuguy83/go-md2man/v2 v2.0.3 go: downloading github.com/xrash/smetrics v0.0.0-20201216005158-039620a65673 go: downloading github.com/jpillora/backoff v1.0.0 go: downloading github.com/prometheus/client_golang v1.18.0 go: downloading github.com/google/gofuzz v1.2.0 go: downloading google.golang.org/genproto/googleapis/rpc v0.0.0-20240102182953-50ed04b92917 go: downloading github.com/russross/blackfriday/v2 v2.1.0 go: downloading gopkg.in/inf.v0 v0.9.1 go: downloading k8s.io/klog/v2 v2.110.1 go: downloading k8s.io/utils v0.0.0-20240102154912-e7106e64919e go: downloading sigs.k8s.io/structured-merge-diff/v4 v4.4.1 go: downloading github.com/golang/protobuf v1.5.3 go: downloading google.golang.org/protobuf v1.33.0 go: downloading sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd go: downloading github.com/beorn7/perks v1.0.1 go: downloading github.com/cespare/xxhash/v2 v2.2.0 go: downloading github.com/prometheus/client_model v0.6.0 go: downloading github.com/prometheus/procfs v0.12.0 go: downloading github.com/go-logr/logr v1.4.1 go: downloading github.com/json-iterator/go v1.1.12 go: downloading golang.org/x/text v0.14.0 go: downloading github.com/google/gnostic-models v0.6.8 go: downloading golang.org/x/time v0.5.0 go: downloading golang.org/x/term v0.16.0 go: downloading k8s.io/kube-openapi v0.0.0-20240220201932-37d671a357a5 go: downloading sigs.k8s.io/yaml v1.4.0 go: downloading github.com/modern-go/reflect2 v1.0.2 go: downloading github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd go: downloading github.com/emicklei/go-restful/v3 v3.11.1 go: downloading github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 go: downloading github.com/google/uuid v1.5.0 go: downloading github.com/go-openapi/jsonreference v0.20.4 go: downloading github.com/go-openapi/swag v0.22.7 go: downloading github.com/go-openapi/jsonpointer v0.20.2 go: downloading github.com/mailru/easyjson v0.7.7 go: downloading github.com/josharian/intern v1.0.0
Then I installed the binary using
sudo install binary
and I was able to curl the metrics% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 4868 0 4868 0 0 4753k 0# HELP DCGM_FI_DEV_SM_CLOCK SM clock frequency (in MHz). --# HELP DCGM_FI_DEV_MEM_CLOCK Memory clock frequency (in MHz). :--# HELP DCGM_FI_DEV_MEMORY_TEMP Memory temperature (in C). :-# HELP DCGM_FI_DEV_GPU_TEMP GPU temperature (in C).
HELP DCGM_FI_DEV_POWER_USAGE Power draw (in W).
-# HELP DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION Total energy consumption since boot (in mJ). -:-# HELP DCGM_FI_DEV_PCIE_REPLAY_COUNTER Total number of PCIe retries. -:-# HELP DCGM_FI_DEV_GPU_UTIL GPU utilization (in %). -# HELP DCGM_FI_DEV_MEM_COPY_UTIL Memory utilization (in %). --# HELP DCGM_FI_DEV_ENC_UTIL Encoder utilization (in %). :-# HELP DCGM_FI_DEV_DEC_UTIL Decoder utilization (in %). -:-# HELP DCGM_FI_DEV_XID_ERRORS Value of the last XID error encountered. -# HELP DCGM_FI_DEV_FB_FREE Frame buffer memory free (in MB).
HELP DCGM_FI_DEV_FB_USED Frame buffer memory used (in MB).
4753k
HELP DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL Total number of NVLink bandwidth counters for all lanes
HELP DCGM_FI_DEV_VGPU_LICENSE_STATUS vGPU License status
Issue is
when I do the same with ansible it is not happening as expected and failing at the make binary step. Giving the following error
go: downloading github.com/go-openapi/swag v0.22.7 go: downloading github.com/google/uuid v1.5.0 go: downloading github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 go: downloading github.com/go-openapi/jsonpointer v0.20.2 go: downloading github.com/mailru/easyjson v0.7.7 go: downloading github.com/josharian/intern v1.0.0
github.com/NVIDIA/go-nvml/pkg/nvml
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:9:10: undefined: _Ctype_struct_nvmlDevice_st /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:320:10: undefined: _Ctype_struct_nvmlUnit_st /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:358:10: undefined: _Ctype_struct_nvmlEventSet_st /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:505:10: undefined: _Ctype_struct_nvmlGpuInstance_st /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:548:10: undefined: _Ctype_struct_nvmlComputeInstance_st /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:552:10: undefined: _Ctype_struct_nvmlGpmSample_st /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/device.go:22:19: undefined: MemoryErrorType /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/device.go:25:29: undefined: Return /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/device.go:32:49: undefined: Return /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/device.go:39:54: undefined: Return /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/device.go:39:54: too many errors
github.com/NVIDIA/go-dcgm/pkg/dcgm
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:22:13: undefined: mode /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:67:41: undefined: Field_Entity_Group /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:77:33: undefined: Device /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:82:35: undefined: DeviceStatus /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:87:39: undefined: P2PLink /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:93:24: undefined: GroupHandle /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:98:27: undefined: GroupHandle /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:98:53: undefined: ProcessInfo /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:103:38: undefined: DeviceHealth /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:108:60: undefined: policyCondition /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:108:60: too many errors make: *** [Makefile:34: binary] Error 1
What did you expect to happen?
I expect the installation of the exporter should be successful via ansible (since I was able to do it manually though the make binary is not working as expected).
But during ansible run it gives the following output. I have also used ansible privilege escalation but still the same.
go: downloading github.com/go-openapi/swag v0.22.7 go: downloading github.com/google/uuid v1.5.0 go: downloading github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 go: downloading github.com/go-openapi/jsonpointer v0.20.2 go: downloading github.com/mailru/easyjson v0.7.7 go: downloading github.com/josharian/intern v1.0.0
github.com/NVIDIA/go-nvml/pkg/nvml
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:9:10: undefined: _Ctype_struct_nvmlDevice_st /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:320:10: undefined: _Ctype_struct_nvmlUnit_st /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:358:10: undefined: _Ctype_struct_nvmlEventSet_st /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:505:10: undefined: _Ctype_struct_nvmlGpuInstance_st /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:548:10: undefined: _Ctype_struct_nvmlComputeInstance_st /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:552:10: undefined: _Ctype_struct_nvmlGpmSample_st /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/device.go:22:19: undefined: MemoryErrorType /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/device.go:25:29: undefined: Return /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/device.go:32:49: undefined: Return /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/device.go:39:54: undefined: Return /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/device.go:39:54: too many errors
github.com/NVIDIA/go-dcgm/pkg/dcgm
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:22:13: undefined: mode /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:67:41: undefined: Field_Entity_Group /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:77:33: undefined: Device /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:82:35: undefined: DeviceStatus /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:87:39: undefined: P2PLink /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:93:24: undefined: GroupHandle /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:98:27: undefined: GroupHandle /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:98:53: undefined: ProcessInfo /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:103:38: undefined: DeviceHealth /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:108:60: undefined: policyCondition /root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:108:60: too many errors make: *** [Makefile:34: binary] Error 1
What is the GPU model?
No response
What is the environment?
No response
How did you deploy the dcgm-exporter and what is the configuration?
No response
How to reproduce the issue?
No response
Anything else we need to know?
No response