NVIDIA / go-nvml

Go Bindings for the NVIDIA Management Library (NVML)
Apache License 2.0
290 stars 62 forks source link

go\pkg\mod\github.com\!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:25:29: undefined: Return #49

Open djsxianglei opened 2 years ago

djsxianglei commented 2 years ago

go get github.com/NVIDIA/go-nvml/pkg/nvml error D:\www\go-nvml>go get github.com/NVIDIA/go-nvml/pkg/nvml

github.com/NVIDIA/go-nvml/pkg/nvml

C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:25:29: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:32:49: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:39:54: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:46:50: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:53:58: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:60:44: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:66:41: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:71:37: undefined: BrandType C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:71:48: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\types_gen.go:9:10: undefined: _Ctype_struct_nvmlDevice_st C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:71:48: too many errors

elezar commented 2 years ago

@djsxianglei it seems as if you are running this on a Windows machine. As far as I am aware there is platform specific which has not yet been updated to support windows. We do have an issue open to track this (see #1) and any contributions would be welcome.

djsxianglei commented 2 years ago

@elezar thanks.I tried it in a linux environment.

elezar commented 1 year ago

@djsxianglei did switching to Linux solve your issues?

shaktsin commented 1 year ago

I am using a linux container and it fails with the following error

/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-0/pkg/dl/dl.go:34:18: could not determine kind of name for C.RTLD_DEEPBIND

klueska commented 1 year ago

RTLD_DEEPBIND should be available as of glibc 2.3.4. What version of glibc do you have in your development environment where you are trying to compile this?

shaktsin commented 1 year ago

v1.2.2

klueska commented 1 year ago

that doesn't sound like a glibc version to me, but rather a musl libc version (on which NVML is not supported).

roma-glushko commented 1 year ago

I'm having the same set of errors during building phase in an app that uses the nvml bindings. The build process happens in a docker container (because I'm on MacOS) created by this image:

# syntax=docker/dockerfile:1
FROM nvidia/cuda:12.2.0-devel-ubuntu22.04

RUN apt-get update -y -q && apt-get upgrade -y -q
RUN DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y -q curl build-essential ca-certificates git

RUN curl -s https://storage.googleapis.com/golang/go1.20.4.linux-amd64.tar.gz | tar -v -C /usr/local -xz
ENV PATH $PATH:/usr/local/go/bin

RUN curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin v1.53.3

WORKDIR /service

COPY go.mod go.sum main.go /service/
RUN go mod download

The exact build command then looks this way:

GOOS?=darwin
COMMIT ?= $(shell git describe --dirty --long --always)
VERSION := $(shell cat ./VERSION)
LDFLAGS_COMMON := -X main.commitSha=$(COMMIT) -X main.version=$(VERSION) -s -w

build: ## Build a binary
    @CGO_ENABLED=0 GOARCH=amd64 go build -ldflags "$(LDFLAGS_COMMON)" -o ./dist/resbeat

linux-%: image-build
    @docker run --rm -v "$(PWD)":/service -w /service -e GOOS=linux romahlushko/resbeat-build:latest make $*

# make linux-build

I'm ending up getting this error:

[+] Building 3.9s (15/15) FINISHED                                                                                                                                                                                      
 => [internal] load build definition from build.Dockerfile                                                                                                                                                         0.1s
 => => transferring dockerfile: 638B                                                                                                                                                                               0.0s
 => [internal] load .dockerignore                                                                                                                                                                                  0.0s
 => => transferring context: 2B                                                                                                                                                                                    0.0s
 => resolve image config for docker.io/docker/dockerfile:1                                                                                                                                                         2.9s
 => CACHED docker-image://docker.io/docker/dockerfile:1@sha256:39b85bbfa7536a5feceb7372a0817649ecb2724562a38360f4d6a7782a409b14                                                                                    0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:12.2.0-devel-ubuntu22.04                                                                                                                                    0.7s
 => [1/8] FROM docker.io/nvidia/cuda:12.2.0-devel-ubuntu22.04@sha256:0e2d7e252847c334b056937e533683556926f5343a472b6b92f858a7af8ab880                                                                              0.0s
 => [internal] load build context                                                                                                                                                                                  0.0s
 => => transferring context: 81B                                                                                                                                                                                   0.0s
 => CACHED [2/8] RUN apt-get update -y -q && apt-get upgrade -y -q                                                                                                                                                 0.0s
 => CACHED [3/8] RUN DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y -q curl build-essential ca-certificates git                                                                         0.0s
 => CACHED [4/8] RUN curl -s https://storage.googleapis.com/golang/go1.20.4.linux-amd64.tar.gz | tar -v -C /usr/local -xz                                                                                          0.0s
 => CACHED [5/8] RUN curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin v1.53.3                                                              0.0s
 => CACHED [6/8] WORKDIR /service                                                                                                                                                                                  0.0s
 => CACHED [7/8] COPY go.mod go.sum main.go /service/                                                                                                                                                              0.0s
 => CACHED [8/8] RUN go mod download                                                                                                                                                                               0.0s
 => exporting to image                                                                                                                                                                                             0.0s
 => => exporting layers                                                                                                                                                                                            0.0s
 => => writing image sha256:aa35910e75093c11c5c1bf04c44f1b0418b84905a1ee2f2981731b81e26a46d3                                                                                                                       0.0s
 => => naming to docker.io/romahlushko/resbeat-build                                                                                                                                                               0.0s

==========
== CUDA ==
==========

CUDA Version 12.2.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

# github.com/NVIDIA/go-nvml/pkg/nvml
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/types_gen.go:9:10: undefined: _Ctype_struct_nvmlDevice_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/types_gen.go:320:10: undefined: _Ctype_struct_nvmlUnit_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/types_gen.go:358:10: undefined: _Ctype_struct_nvmlEventSet_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/types_gen.go:505:10: undefined: _Ctype_struct_nvmlGpuInstance_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/types_gen.go:548:10: undefined: _Ctype_struct_nvmlComputeInstance_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/types_gen.go:552:10: undefined: _Ctype_struct_nvmlGpmSample_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/device.go:22:19: undefined: MemoryErrorType
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/device.go:25:29: undefined: Return
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/device.go:32:49: undefined: Return
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/device.go:39:54: undefined: Return
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/device.go:39:54: too many errors
make: *** [Makefile:12: build] Error 1

The error occurs when I'm compiling with CGO_ENABLED=0, otherwise, anther errors occur:

./resbeat: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./resbeat)
./resbeat: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./resbeat)

By the end of the day, I want to get this NVML integration completely option in the app, so the app could be ran in environments without GPU/NVIDIA libraries while supporting more capabilities if those pieces are present. So what is the best way to achieve that besides having ifs that would guard calling of the nvml bindings?

elezar commented 1 year ago

@roma-glushko the following is an example of a Golang ap that we build which consumes go-nvml: https://github.com/NVIDIA/k8s-device-plugin/blob/main/deployments/container/Dockerfile.ubuntu

We build this on MacOS regularly. Note that we also privide the following build flags:

https://github.com/NVIDIA/k8s-device-plugin/blob/8b4160169defedbc95beb2f56f1cb660b510d28a/Makefile#L58-L59

To ensure that this executable does not complain about missing symbols.

roma-glushko commented 1 year ago

@elezar thank you, Evan! This is probably what I needed. Let me try it myself and get back to you.

P.S. You may consider referencing this somewhere in the readme as a vetted example of using nvml-go library. That should be helpful 🙌

roma-glushko commented 1 year ago

@elezar Hey Evan, I have tried to add those additional env var, but it doesn't seem to help me to build the app on Mac:

// this is the new command I have ended up trying:
CGO_LDFLAGS_ALLOW="-Wl,--unresolved-symbols=ignore-in-object-files"  GOOS=darwin GOARCH=amd64 \
                go build -ldflags "-s -w -X main.commitSha=1.0.2-8-ga481f406f6f1016-dirty -X main.version=1.0.3" -o ./dist/resbeat

# github.com/NVIDIA/go-nvml/pkg/dl
../../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/dl/dl.go:34:18: could not determine kind of name for C.RTLD_DEEPBIND
make: *** [build] Error 1

I feel like those additional flags CGO_LDFLAGS_ALLOW="-Wl,--unresolved-symbols=ignore-in-object-files" did not really change the situation for some reason.

Then I have gone for another test and pulled the repo you have referenced. This is what I could see trying to run make cmds (this is with GOOS=darwin ):

Screenshot 2023-07-23 at 14 23 59

With GOOS=linux (the default in the makefile), I'm getting this error:

Screenshot 2023-07-23 at 14 31 45

So I'm really wondering how do you build and run apps with nvml-go bindings imported on Mac.

elezar commented 1 year ago

We have not tested go-nvml on Mac and usually build applications in a docker container. You should be able to build applications that consume go-nvml on a Mac by wrapping the imported code in Linux-only files. This assumes that the go-nvml functionality is not required on mac. Note that the Device Plugin that you are trying to build does not do this and also imports other linux-only packages.

roma-glushko commented 1 year ago

@elezar, Thank you!

I actually ended up doing exactly this: https://github.com/roma-glushko/resbeat/pull/26/files#diff-746b3f26090da044e5e670226993d9ccfa645b3e9ee9aad0fd6605a860dbb034R1

Plus, I added a mock for non-linux environments: https://github.com/roma-glushko/resbeat/pull/26/files#diff-f06cf2c8d2d31474e8be3143614a45cf20827660ce264237d311c637c4ce30b6R2

Hope this will be helpful for someone else!

asm582 commented 4 months ago

Hello, I get a similar error when building it on Linux system:

/bin/controller-gen-v0.14.0 object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
# github.com/NVIDIA/go-nvml/pkg/nvml
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/types_gen.go:9:10: undefined: _Ctype_struct_nvmlDevice_st
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/types_gen.go:320:10: undefined: _Ctype_struct_nvmlUnit_st
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/types_gen.go:358:10: undefined: _Ctype_struct_nvmlEventSet_st
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/types_gen.go:505:10: undefined: _Ctype_struct_nvmlGpuInstance_st
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/types_gen.go:548:10: undefined: _Ctype_struct_nvmlComputeInstance_st
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/types_gen.go:552:10: undefined: _Ctype_struct_nvmlGpmSample_st
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/device.go:22:19: undefined: MemoryErrorType
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/device.go:25:29: undefined: Return
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/device.go:32:49: undefined: Return
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/device.go:39:54: undefined: Return
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/device.go:39:54: too many errors

I set the below env variables:

GOOS?=linux
GOARCH?=arm64
CGO_ENABLED?=0
CLI_VERSION_PACKAGE := main
COMMIT ?= $(shell git describe --dirty --long --always --abbrev=15)
CGO_LDFLAGS_ALLOW := "-Wl,--unresolved-symbols=ignore-in-object-files"
LDFLAGS_COMMON := "-s -w -X $(CLI_VERSION_PACKAGE).commitSha=$(COMMIT) -X $(CLI_VERSION_PACKAGE).version=$(VERSION)"

I run make build with the below args:

.PHONY: build
build: manifests generate fmt vet ## Build manager binary.
    @CGO_LDFLAGS_ALLOW=$(CGO_LDFLAGS_ALLOW) CGO_ENABLED=$(CGO_ENABLED) GOOS=$(GOOS) GOARCH=$(GOARCH) \
        go build -ldflags $(LDFLAGS_COMMON) -o bin/manager cmd/main.go

any pointers?

elezar commented 4 months ago

@asm582 does setting CGO_ENABLED?=0 not disable cgo? Since this package represents bindings for the C-based livnvidia-ml.so library, cgo is required.

asm582 commented 4 months ago

Ok I did CGO_ENABLED?=1

now I get below error:

make build
/home/openstack/asmalvan/instaslice2/bin/controller-gen-v0.14.0 rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/openstack/asmalvan/instaslice2/bin/controller-gen-v0.14.0 object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
# runtime/cgo
gcc_arm64.S: Assembler messages:
gcc_arm64.S:30: Error: no such instruction: `stp x29,x30,[sp,'
gcc_arm64.S:34: Error: too many memory references for `mov'
gcc_arm64.S:36: Error: no such instruction: `stp x19,x20,[sp,'
gcc_arm64.S:39: Error: no such instruction: `stp x21,x22,[sp,'
gcc_arm64.S:42: Error: no such instruction: `stp x23,x24,[sp,'
gcc_arm64.S:45: Error: no such instruction: `stp x25,x26,[sp,'
gcc_arm64.S:48: Error: no such instruction: `stp x27,x28,[sp,'
gcc_arm64.S:52: Error: too many memory references for `mov'
gcc_arm64.S:53: Error: too many memory references for `mov'
gcc_arm64.S:54: Error: too many memory references for `mov'
gcc_arm64.S:56: Error: no such instruction: `blr x20'
gcc_arm64.S:57: Error: no such instruction: `blr x19'
gcc_arm64.S:59: Error: no such instruction: `ldp x27,x28,[sp,'
gcc_arm64.S:62: Error: no such instruction: `ldp x25,x26,[sp,'
gcc_arm64.S:65: Error: no such instruction: `ldp x23,x24,[sp,'
gcc_arm64.S:68: Error: no such instruction: `ldp x21,x22,[sp,'
gcc_arm64.S:71: Error: no such instruction: `ldp x19,x20,[sp,'
gcc_arm64.S:74: Error: no such instruction: `ldp x29,x30,[sp],'
make: *** [Makefile:91: build] Error 1
klueska commented 4 months ago

Can you provide a minimal reproducer that we can run ourselves? Without the ability to reproduce this ourselves (or at least see the exact full of code being compiled) we are not going to be able to help much.

asm582 commented 4 months ago

Thanks for all your help, the project builds and I can deploy the container. On the Ubuntu machine, I got away with all the compile flags that were added earlier, the build step in the make file is as below, which is also provided by the kubebuilder scaffolding logic:

.PHONY: build
build: manifests generate fmt vet ## Build manager binary.
    go build -o bin/manager cmd/main.go

To build the container image, I used the dockerfile from the DRA repo with modifications added from the kubebuilder scaffolding :

ARG GOLANG_VERSION=1.22.2

FROM nvidia/cuda:12.4.1-base-ubuntu22.04 as build

RUN apt-get update && \
    apt-get install -y wget make git gcc \
    && \
    rm -rf /var/lib/apt/lists/*

#TODO: Remove arch discovery
RUN set -eux; \
    \
    arch="$(uname -m)"; \
    case "${arch##*-}" in \
        x86_64 | amd64) ARCH='amd64' ;; \
        ppc64el | ppc64le) ARCH='ppc64le' ;; \
        aarch64) ARCH='arm64' ;; \
        *) echo "unsupported architecture" ; exit 1 ;; \
    esac; \
       wget -nv -O - https://storage.googleapis.com/golang/go1.22.2.linux-amd64.tar.gz \
    | tar -C /usr/local -xz

ENV GOPATH /go
ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH

WORKDIR /workspace

# Copy the Go Modules manifests
COPY go.mod go.mod
COPY go.sum go.sum

RUN go mod download

# Copy the go source
COPY cmd/main.go cmd/main.go
COPY api/ api/
COPY internal/controller/ internal/controller/

RUN go build -o bin/manager cmd/main.go

FROM nvidia/cuda:12.4.1-base-ubuntu22.04

# Remove CUDA libs(compat etc) in favor of libs installed by the NVIDIA driver
RUN rm -f cuda-*.deb
RUN apt-get --purge -y autoremove cuda-*

ENV NVIDIA_DISABLE_REQUIRE="true"
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility

WORKDIR /

COPY --from=build /workspace/bin/manager .

# Install / upgrade packages here that are required to resolve CVEs
ARG CVE_UPDATES
RUN if [ -n "${CVE_UPDATES}" ]; then \
        rm -f /etc/apt/sources.list.d/cuda.list && \
        apt-get update && apt-get upgrade -y ${CVE_UPDATES} && \
        rm -rf /var/lib/apt/lists/*; \
    fi

ENTRYPOINT ["/manager"]

I hope someone finds this helpful!