Open enihcam opened 10 months ago
Isn't the GPU a device? Say /dev/gpu?
Could you try
ctr=$(buildah from --device /dev/gpu ...) buildah run $ctr ...
A friendly reminder that this issue had no activity for 30 days.
Isn't the GPU a device? Say /dev/gpu?
Could you try
ctr=$(buildah from --device /dev/gpu ...) buildah run $ctr ...
sorry for late reply. i tried the following:
ctr=$(buildah --device /dev/nvidia0 from for.example.com/gpu_image_for_test)
buildah run $ctr /bin/bash
and then nvidia-smi
gave me no output at all.
btw, this container is run in another container with vfs+chroot mode.
Could you try
buildah --device=nvidia.com/gpu=all from ...
Could you try
buildah --device=nvidia.com/gpu=all from ...
stat nvidia.com/gpu=all: no such file or directory
What version of buildah are you using?
What version of buildah are you using?
~ # buildah version
Version: 1.33.7
Go Version: go1.21.9 (Red Hat 1.21.9-1.module+el8.8.0+632+2dde9914)
Image Spec: 1.1.0-rc.5
Runtime Spec: 1.1.0
CNI Spec: 1.0.0
libcni Version: v1.1.2
image Version: 5.29.2
Git Commit:
Built: Tue Jun 18 11:12:42 2024
OS/Arch: linux/amd64
BuildPlatform: linux/amd64
~ # env | grep BUILDAH
BUILDAH_FORMAT=docker
BUILDAH_ISOLATION=chroot
~ # env | grep STORAGE
STORAGE_DRIVER=vfs
Any chance you can update the version?
$ buildah -v
buildah version 1.36.0 (image-spec 1.1.0, runtime-spec 1.2.0)
tmp $ buildah version
Version: 1.36.0
Go Version: go1.22.3
Image Spec: 1.1.0
Runtime Spec: 1.2.0
CNI Spec: 1.0.0
libcni Version:
image Version: 5.31.0
Git Commit:
Built: Mon May 27 09:11:54 2024
OS/Arch: linux/amd64
BuildPlatform: linux/amd64
$ git show 7658d9ed7e02ec5cf90cc397f78a5755599b0a32
commit 7658d9ed7e02ec5cf90cc397f78a5755599b0a32
Author: Daniel J Walsh <dwalsh@redhat.com>
Date: Mon Mar 25 11:55:50 2024 -0400
Support nvidia.com/gpus as devices
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
diff --git a/pkg/parse/parse_unix.go b/pkg/parse/parse_unix.go
index ff8ce854e..d3f3dc14c 100644
--- a/pkg/parse/parse_unix.go
+++ b/pkg/parse/parse_unix.go
@@ -7,6 +7,7 @@ import (
"fmt"
"os"
"path/filepath"
+ "strings"
"github.com/containers/buildah/define"
"github.com/opencontainers/runc/libcontainer/devices"
@@ -18,6 +19,12 @@ func DeviceFromPath(device string) (define.ContainerDevices, error) {
if err != nil {
return nil, err
}
+ if strings.HasPrefix(src, "nvidia.com") {
+ device := define.BuildahDevice{Source: src, Destination: dst}
+ devs = append(devs, device)
+ return devs, nil
+ }
+
srcInfo, err := os.Stat(src)
if err != nil {
return nil, fmt.Errorf("getting info of source device %s: %w", src, err)
Yes 1.36 has the patch.
Yes 1.36 has the patch.
https://github.com/containers/buildah/blob/release-1.36/pkg/parse/parse_unix.go
It seems like the patch is missing. Could you confirm? Thanks.
Hello all, any update here? I don't see parse_unix.go having the patch that was mentioned.
@rhatdan your input is needed.
Does the container have access to the necessary CDI configuration in its /etc/cdi
directory, either volume-mounted from the host where nvidia-ctk cdi generate
was run to generate it, or via some other mechanism?
Any workaround before the PR is merged?
I think the current expectation is that, if the data in /etc/cdi
is provided to the container, we won't need this PR, since the CDI logic in 1.36 (and 1.37) already gets a crack at device specifications.
Does the container have access to the necessary CDI configuration in its
/etc/cdi
directory, either volume-mounted from the host wherenvidia-ctk cdi generate
was run to generate it, or via some other mechanism?
Description Failed to discover NVIDIA GPU in the running container started by buildah (vfs + chroot)
Steps to reproduce the issue:
buildah
export STORAGE_DRIVER=vfs
and isolationexport BUILDAH_ISOLATION=chroot
buildah
and run withbuildah
Describe the results you received:
Describe the results you expected: pytorch finds the gpu run the code successfully.
Output of
rpm -q buildah
orapt list buildah
:Output of
buildah version
:Output of
podman version
if reporting apodman build
issue:*Output of `cat /etc/release`:**
Output of
uname -a
:Output of
cat /etc/containers/storage.conf
: