NVIDIA / go-nvml

Go Bindings for the NVIDIA Management Library (NVML)
Apache License 2.0
287 stars 61 forks source link

nvml library is not getting initialized on ubuntu22.04 #116

Open sujithapallapothu opened 4 months ago

sujithapallapothu commented 4 months ago
package main

import (
        "fmt"
        "log"

        "github.com/NVIDIA/go-nvml/pkg/nvml"
)

func main() {
        ret := nvml.Init()
        if ret != nvml.SUCCESS {
                log.Fatalf("Unable to initialize NVML: %v", nvml.ErrorString(ret))
        }
        defer func() {
                ret := nvml.Shutdown()
                if ret != nvml.SUCCESS {
                        log.Fatalf("Unable to shutdown NVML: %v", nvml.ErrorString(ret))
                }
        }()

        count, ret := nvml.DeviceGetCount()
        fmt.Println("count",count)
        if ret != nvml.SUCCESS {
                log.Fatalf("Unable to get device count: %v", nvml.ErrorString(ret))
        }

        for i := 0; i < count; i++ {
                device, ret := nvml.DeviceGetHandleByIndex(i)
                if ret != nvml.SUCCESS {
                        log.Fatalf("Unable to get device at index %d: %v", i, nvml.ErrorString(ret))
                }

                uuid, ret := device.GetUUID()
                if ret != nvml.SUCCESS {
                        log.Fatalf("Unable to get uuid of device at index %d: %v", i, nvml.ErrorString(ret))
                }

                fmt.Printf("%v\n", uuid)

                processInfos, ret := device.GetComputeRunningProcesses()
                if ret != nvml.SUCCESS {
                        log.Fatalf("Unable to get process info for device at index %d: %v", i, nvml.ErrorString(ret))
                }
                fmt.Printf("Found %d processes on device %d\n", len(processInfos), i)
                for pi, processInfo := range processInfos {
                        fmt.Printf("\t[%2d] ProcessInfo: %+v\n", pi, processInfo)
                }

        }

When Im executing above go code, getting below error in my linux device

Error initializing NVML:ERROR_LIBRARY_NOT_FOUND

Can someone please suggest why nvml package is not getting initialized even nvml library is getting imported and do exists in above go file ??

Spec of my linux device follows as:

Ubuntu version: 22.04 Graphical card: 61:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 40GB] (rev a1) Nvidia Driver version: 550.54.14 CUDA Version: 12.4 Go version: go1.21.9 linux/amd64

klueska commented 4 months ago

Do you have the NVIDIA driver installed? Where is the libnvidia-ml.so.1 library located on your system?

sujithapallapothu commented 4 months ago

yes @klueska

I have libnvidia-ml.so.1 in my linux device ( ubunut22.04)

root@ubuntu2204:/tmp# locate libnvidia-ml.so.1

/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1

sujithapallapothu commented 4 months ago
image
elezar commented 4 months ago

@sujithapallapothu the error message: "Error initializing NVML" does not seem to exist in the go-nvml code base and is also not present in the snippet that you pasted above.

Could you give more information about your environment -- including the output of nvidia-smi?

The code you show seems to come from one of the examples included in the repository, could you check out the latest version off main and run make examples in the root folder. You should be able to run these examples then.

sujithapallapothu commented 4 months ago

@elezar yes you are right, I have taken code from examples and wrote into my sample.go file which looks like below

    if hasNvidiaGPUs() {
        err := nvml.Init()
        if err != nvml.SUCCESS {
            fmt.Println("Error initializing NVML:", err)
            //return err

        }
        defer nvml.Shutdown()

        deviceCount, err := nvml.DeviceGetCount()
        if err != nvml.SUCCESS {
            fmt.Println("Error getting device count:", err)
            //log.Fatalf("Unable to get device count: %v", nvml.ErrorString(err))
        }
        fmt.Println("Number of NVIDIA GPUs:", deviceCount)
    } else {
        fmt.Println("No Nividia GPUs")
    }

where hasNvidiaGPUs() function checks nvidia graphical card exists or not. I built above code using go build  -tags netgo -ldflags '-s -extldflags "-static"' sample.go and then excuted go binary which results in Error initializing NVML:ERROR_LIBRARY_NOT_FOUND

image

more details about my env is as follows

Ubuntu version: 22.04 Graphical card: 61:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 40GB] (rev a1) Nvidia Driver version: 550.54.14 CUDA Version: 12.4 Go version: go1.21.9 linux/amd64

Please help further on this.

Thankyou

sujithapallapothu commented 4 months ago
image

Im getting above error which is in go-nvml code, seems like library loading is failing. Do i need to set any go env flags while building go binary ??

Please suggest @klueska @elezar

elezar commented 4 months ago

Note that when we build applications on linux that use this library we specify:

-ldflags "-s -w '-extldflags=-Wl,--export-dynamic -Wl,--unresolved-symbols=ignore-in-object-files'

It could be that the static flag is causign the libnvidia-ml.so.1 library to not be loaded.