NVIDIA / go-dcgm

Golang bindings for Nvidia Datacenter GPU Manager (DCGM)
Apache License 2.0
95 stars 26 forks source link

Compilation errors of dcgm-exporter and go-dcgm #74

Open zoobab opened 1 week ago

zoobab commented 1 week ago

I am trying to compile dcgm-exporter tool (git version 965b2de86d647d6c4c3a9ebe0d66e7ebf46045f5), which throws compilation errors of go-dcgm:

(golang) root@9ecfb1571994:/mnt/dcgm-exporter# go version
go version go1.23.3 linux/amd64
(golang) root@9ecfb1571994:/mnt/dcgm-exporter# make binary
go generate ./...
Updating DCGM version in files from 3.3.7 to 3.3.8...
Updating DCGM Exporter version in files from 3.5.0 to 3.6.0...
cd cmd/dcgm-exporter; go build -ldflags "-X main.BuildVersion=3.3.8-3.6.0"
# github.com/NVIDIA/go-nvml/pkg/nvml
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:9:10: undefined: _Ctype_struct_nvmlDevice_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:320:10: undefined: _Ctype_struct_nvmlUnit_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:358:10: undefined: _Ctype_struct_nvmlEventSet_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:505:10: undefined: _Ctype_struct_nvmlGpuInstance_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:548:10: undefined: _Ctype_struct_nvmlComputeInstance_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/types_gen.go:552:10: undefined: _Ctype_struct_nvmlGpmSample_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/device.go:22:19: undefined: MemoryErrorType
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/device.go:25:29: undefined: Return
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/device.go:32:49: undefined: Return
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/device.go:39:54: undefined: Return
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-2/pkg/nvml/device.go:39:54: too many errors
# github.com/NVIDIA/go-dcgm/pkg/dcgm
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:22:13: undefined: mode
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:67:41: undefined: Field_Entity_Group
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:77:33: undefined: Device
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:82:35: undefined: DeviceStatus
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:87:39: undefined: P2PLink
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:93:24: undefined: GroupHandle
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:98:27: undefined: GroupHandle
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:98:53: undefined: ProcessInfo
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:103:38: undefined: DeviceHealth
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:108:60: undefined: policyCondition
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-dcgm@v0.0.0-20240118201113-3385e277e49f/pkg/dcgm/api.go:108:60: too many errors
make: *** [Makefile:36: binary] Error 1

Any idea how do solve those issues?

nvvfedorov commented 1 week ago

@zoobab , Unfortunately, there is not enough information to tell what exactly is wrong. However, my best guess is that required header files (DCGM and NVML) are missing from your build environment. To build the dcgm exporter or go-dcgm, you need to have DCGM and CUDA libraries installed. Additionally, you can try using the devcontainer: https://github.com/NVIDIA/dcgm-exporter/tree/main/.devcontainer.