Open djsxianglei opened 2 years ago
@djsxianglei it seems as if you are running this on a Windows machine. As far as I am aware there is platform specific which has not yet been updated to support windows. We do have an issue open to track this (see #1) and any contributions would be welcome.
@elezar thanks.I tried it in a linux environment.
@djsxianglei did switching to Linux solve your issues?
I am using a linux container and it fails with the following error
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-0/pkg/dl/dl.go:34:18: could not determine kind of name for C.RTLD_DEEPBIND
RTLD_DEEPBIND should be available as of glibc 2.3.4. What version of glibc do you have in your development environment where you are trying to compile this?
v1.2.2
that doesn't sound like a glibc version to me, but rather a musl libc version (on which NVML is not supported).
I'm having the same set of errors during building phase in an app that uses the nvml bindings. The build process happens in a docker container (because I'm on MacOS) created by this image:
# syntax=docker/dockerfile:1
FROM nvidia/cuda:12.2.0-devel-ubuntu22.04
RUN apt-get update -y -q && apt-get upgrade -y -q
RUN DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y -q curl build-essential ca-certificates git
RUN curl -s https://storage.googleapis.com/golang/go1.20.4.linux-amd64.tar.gz | tar -v -C /usr/local -xz
ENV PATH $PATH:/usr/local/go/bin
RUN curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin v1.53.3
WORKDIR /service
COPY go.mod go.sum main.go /service/
RUN go mod download
The exact build command then looks this way:
GOOS?=darwin
COMMIT ?= $(shell git describe --dirty --long --always)
VERSION := $(shell cat ./VERSION)
LDFLAGS_COMMON := -X main.commitSha=$(COMMIT) -X main.version=$(VERSION) -s -w
build: ## Build a binary
@CGO_ENABLED=0 GOARCH=amd64 go build -ldflags "$(LDFLAGS_COMMON)" -o ./dist/resbeat
linux-%: image-build
@docker run --rm -v "$(PWD)":/service -w /service -e GOOS=linux romahlushko/resbeat-build:latest make $*
# make linux-build
I'm ending up getting this error:
[+] Building 3.9s (15/15) FINISHED
=> [internal] load build definition from build.Dockerfile 0.1s
=> => transferring dockerfile: 638B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> resolve image config for docker.io/docker/dockerfile:1 2.9s
=> CACHED docker-image://docker.io/docker/dockerfile:1@sha256:39b85bbfa7536a5feceb7372a0817649ecb2724562a38360f4d6a7782a409b14 0.0s
=> [internal] load metadata for docker.io/nvidia/cuda:12.2.0-devel-ubuntu22.04 0.7s
=> [1/8] FROM docker.io/nvidia/cuda:12.2.0-devel-ubuntu22.04@sha256:0e2d7e252847c334b056937e533683556926f5343a472b6b92f858a7af8ab880 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 81B 0.0s
=> CACHED [2/8] RUN apt-get update -y -q && apt-get upgrade -y -q 0.0s
=> CACHED [3/8] RUN DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y -q curl build-essential ca-certificates git 0.0s
=> CACHED [4/8] RUN curl -s https://storage.googleapis.com/golang/go1.20.4.linux-amd64.tar.gz | tar -v -C /usr/local -xz 0.0s
=> CACHED [5/8] RUN curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin v1.53.3 0.0s
=> CACHED [6/8] WORKDIR /service 0.0s
=> CACHED [7/8] COPY go.mod go.sum main.go /service/ 0.0s
=> CACHED [8/8] RUN go mod download 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:aa35910e75093c11c5c1bf04c44f1b0418b84905a1ee2f2981731b81e26a46d3 0.0s
=> => naming to docker.io/romahlushko/resbeat-build 0.0s
==========
== CUDA ==
==========
CUDA Version 12.2.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
# github.com/NVIDIA/go-nvml/pkg/nvml
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/types_gen.go:9:10: undefined: _Ctype_struct_nvmlDevice_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/types_gen.go:320:10: undefined: _Ctype_struct_nvmlUnit_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/types_gen.go:358:10: undefined: _Ctype_struct_nvmlEventSet_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/types_gen.go:505:10: undefined: _Ctype_struct_nvmlGpuInstance_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/types_gen.go:548:10: undefined: _Ctype_struct_nvmlComputeInstance_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/types_gen.go:552:10: undefined: _Ctype_struct_nvmlGpmSample_st
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/device.go:22:19: undefined: MemoryErrorType
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/device.go:25:29: undefined: Return
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/device.go:32:49: undefined: Return
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/device.go:39:54: undefined: Return
/root/go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/nvml/device.go:39:54: too many errors
make: *** [Makefile:12: build] Error 1
The error occurs when I'm compiling with CGO_ENABLED=0
, otherwise, anther errors occur:
./resbeat: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./resbeat)
./resbeat: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./resbeat)
By the end of the day, I want to get this NVML integration completely option in the app, so the app could be ran in environments without GPU/NVIDIA libraries while supporting more capabilities if those pieces are present. So what is the best way to achieve that besides having ifs that would guard calling of the nvml bindings?
@roma-glushko the following is an example of a Golang ap that we build which consumes go-nvml
: https://github.com/NVIDIA/k8s-device-plugin/blob/main/deployments/container/Dockerfile.ubuntu
We build this on MacOS regularly. Note that we also privide the following build flags:
To ensure that this executable does not complain about missing symbols.
@elezar thank you, Evan! This is probably what I needed. Let me try it myself and get back to you.
P.S. You may consider referencing this somewhere in the readme as a vetted example of using nvml-go library. That should be helpful 🙌
@elezar Hey Evan, I have tried to add those additional env var, but it doesn't seem to help me to build the app on Mac:
// this is the new command I have ended up trying:
CGO_LDFLAGS_ALLOW="-Wl,--unresolved-symbols=ignore-in-object-files" GOOS=darwin GOARCH=amd64 \
go build -ldflags "-s -w -X main.commitSha=1.0.2-8-ga481f406f6f1016-dirty -X main.version=1.0.3" -o ./dist/resbeat
# github.com/NVIDIA/go-nvml/pkg/dl
../../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-1/pkg/dl/dl.go:34:18: could not determine kind of name for C.RTLD_DEEPBIND
make: *** [build] Error 1
I feel like those additional flags CGO_LDFLAGS_ALLOW="-Wl,--unresolved-symbols=ignore-in-object-files"
did not really change the situation for some reason.
Then I have gone for another test and pulled the repo you have referenced. This is what I could see trying to run make cmds
(this is with GOOS=darwin
):
With GOOS=linux
(the default in the makefile), I'm getting this error:
So I'm really wondering how do you build and run apps with nvml-go bindings imported on Mac.
We have not tested go-nvml
on Mac and usually build applications in a docker container. You should be able to build applications that consume go-nvml
on a Mac by wrapping the imported code in Linux-only files. This assumes that the go-nvml functionality is not required on mac. Note that the Device Plugin that you are trying to build does not do this and also imports other linux-only packages.
@elezar, Thank you!
I actually ended up doing exactly this: https://github.com/roma-glushko/resbeat/pull/26/files#diff-746b3f26090da044e5e670226993d9ccfa645b3e9ee9aad0fd6605a860dbb034R1
Plus, I added a mock for non-linux environments: https://github.com/roma-glushko/resbeat/pull/26/files#diff-f06cf2c8d2d31474e8be3143614a45cf20827660ce264237d311c637c4ce30b6R2
Hope this will be helpful for someone else!
Hello, I get a similar error when building it on Linux system:
/bin/controller-gen-v0.14.0 object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
# github.com/NVIDIA/go-nvml/pkg/nvml
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/types_gen.go:9:10: undefined: _Ctype_struct_nvmlDevice_st
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/types_gen.go:320:10: undefined: _Ctype_struct_nvmlUnit_st
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/types_gen.go:358:10: undefined: _Ctype_struct_nvmlEventSet_st
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/types_gen.go:505:10: undefined: _Ctype_struct_nvmlGpuInstance_st
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/types_gen.go:548:10: undefined: _Ctype_struct_nvmlComputeInstance_st
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/types_gen.go:552:10: undefined: _Ctype_struct_nvmlGpmSample_st
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/device.go:22:19: undefined: MemoryErrorType
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/device.go:25:29: undefined: Return
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/device.go:32:49: undefined: Return
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/device.go:39:54: undefined: Return
../../go/pkg/mod/github.com/!n!v!i!d!i!a/go-nvml@v0.12.0-3/pkg/nvml/device.go:39:54: too many errors
I set the below env variables:
GOOS?=linux
GOARCH?=arm64
CGO_ENABLED?=0
CLI_VERSION_PACKAGE := main
COMMIT ?= $(shell git describe --dirty --long --always --abbrev=15)
CGO_LDFLAGS_ALLOW := "-Wl,--unresolved-symbols=ignore-in-object-files"
LDFLAGS_COMMON := "-s -w -X $(CLI_VERSION_PACKAGE).commitSha=$(COMMIT) -X $(CLI_VERSION_PACKAGE).version=$(VERSION)"
I run make build with the below args:
.PHONY: build
build: manifests generate fmt vet ## Build manager binary.
@CGO_LDFLAGS_ALLOW=$(CGO_LDFLAGS_ALLOW) CGO_ENABLED=$(CGO_ENABLED) GOOS=$(GOOS) GOARCH=$(GOARCH) \
go build -ldflags $(LDFLAGS_COMMON) -o bin/manager cmd/main.go
any pointers?
@asm582 does setting CGO_ENABLED?=0
not disable cgo? Since this package represents bindings for the C-based livnvidia-ml.so
library, cgo is required.
Ok I did CGO_ENABLED?=1
now I get below error:
make build
/home/openstack/asmalvan/instaslice2/bin/controller-gen-v0.14.0 rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/openstack/asmalvan/instaslice2/bin/controller-gen-v0.14.0 object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
# runtime/cgo
gcc_arm64.S: Assembler messages:
gcc_arm64.S:30: Error: no such instruction: `stp x29,x30,[sp,'
gcc_arm64.S:34: Error: too many memory references for `mov'
gcc_arm64.S:36: Error: no such instruction: `stp x19,x20,[sp,'
gcc_arm64.S:39: Error: no such instruction: `stp x21,x22,[sp,'
gcc_arm64.S:42: Error: no such instruction: `stp x23,x24,[sp,'
gcc_arm64.S:45: Error: no such instruction: `stp x25,x26,[sp,'
gcc_arm64.S:48: Error: no such instruction: `stp x27,x28,[sp,'
gcc_arm64.S:52: Error: too many memory references for `mov'
gcc_arm64.S:53: Error: too many memory references for `mov'
gcc_arm64.S:54: Error: too many memory references for `mov'
gcc_arm64.S:56: Error: no such instruction: `blr x20'
gcc_arm64.S:57: Error: no such instruction: `blr x19'
gcc_arm64.S:59: Error: no such instruction: `ldp x27,x28,[sp,'
gcc_arm64.S:62: Error: no such instruction: `ldp x25,x26,[sp,'
gcc_arm64.S:65: Error: no such instruction: `ldp x23,x24,[sp,'
gcc_arm64.S:68: Error: no such instruction: `ldp x21,x22,[sp,'
gcc_arm64.S:71: Error: no such instruction: `ldp x19,x20,[sp,'
gcc_arm64.S:74: Error: no such instruction: `ldp x29,x30,[sp],'
make: *** [Makefile:91: build] Error 1
Can you provide a minimal reproducer that we can run ourselves? Without the ability to reproduce this ourselves (or at least see the exact full of code being compiled) we are not going to be able to help much.
Thanks for all your help, the project builds and I can deploy the container. On the Ubuntu machine, I got away with all the compile flags that were added earlier, the build step in the make file is as below, which is also provided by the kubebuilder scaffolding logic:
.PHONY: build
build: manifests generate fmt vet ## Build manager binary.
go build -o bin/manager cmd/main.go
To build the container image, I used the dockerfile from the DRA repo with modifications added from the kubebuilder scaffolding :
ARG GOLANG_VERSION=1.22.2
FROM nvidia/cuda:12.4.1-base-ubuntu22.04 as build
RUN apt-get update && \
apt-get install -y wget make git gcc \
&& \
rm -rf /var/lib/apt/lists/*
#TODO: Remove arch discovery
RUN set -eux; \
\
arch="$(uname -m)"; \
case "${arch##*-}" in \
x86_64 | amd64) ARCH='amd64' ;; \
ppc64el | ppc64le) ARCH='ppc64le' ;; \
aarch64) ARCH='arm64' ;; \
*) echo "unsupported architecture" ; exit 1 ;; \
esac; \
wget -nv -O - https://storage.googleapis.com/golang/go1.22.2.linux-amd64.tar.gz \
| tar -C /usr/local -xz
ENV GOPATH /go
ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH
WORKDIR /workspace
# Copy the Go Modules manifests
COPY go.mod go.mod
COPY go.sum go.sum
RUN go mod download
# Copy the go source
COPY cmd/main.go cmd/main.go
COPY api/ api/
COPY internal/controller/ internal/controller/
RUN go build -o bin/manager cmd/main.go
FROM nvidia/cuda:12.4.1-base-ubuntu22.04
# Remove CUDA libs(compat etc) in favor of libs installed by the NVIDIA driver
RUN rm -f cuda-*.deb
RUN apt-get --purge -y autoremove cuda-*
ENV NVIDIA_DISABLE_REQUIRE="true"
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
WORKDIR /
COPY --from=build /workspace/bin/manager .
# Install / upgrade packages here that are required to resolve CVEs
ARG CVE_UPDATES
RUN if [ -n "${CVE_UPDATES}" ]; then \
rm -f /etc/apt/sources.list.d/cuda.list && \
apt-get update && apt-get upgrade -y ${CVE_UPDATES} && \
rm -rf /var/lib/apt/lists/*; \
fi
ENTRYPOINT ["/manager"]
I hope someone finds this helpful!
go get github.com/NVIDIA/go-nvml/pkg/nvml error D:\www\go-nvml>go get github.com/NVIDIA/go-nvml/pkg/nvml
github.com/NVIDIA/go-nvml/pkg/nvml
C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:25:29: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:32:49: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:39:54: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:46:50: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:53:58: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:60:44: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:66:41: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:71:37: undefined: BrandType C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:71:48: undefined: Return C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\types_gen.go:9:10: undefined: _Ctype_struct_nvmlDevice_st C:\Users\djs\go\pkg\mod\github.com!n!v!i!d!i!a\go-nvml@v0.11.6-0\pkg\nvml\device.go:71:48: too many errors