containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.7k stars 2.41k forks source link

vainfo/GPU access from container #18497

Closed alexmaras closed 1 year ago

alexmaras commented 1 year ago

Issue Description

vainfo doesn't work inside a rootless container without --security-opt label=disable or sudo setenforce 0.

I'm using an AMD GPU, vainfo shows all the correct supported codecs on the host system.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Make Dockerfile with:
    FROM fedora:38
    RUN dnf -y install libva-utils
  2. Run podman container run --group-add keep-groups --device /dev/dri/renderD128:/dev/dri/renderD128 testbench vainfo
  3. Instead, run podman container run --security-opt label=disable --device /dev/dri/renderD128:/dev/dri/renderD128 testbench vainfo

Describe the results you received

error: XDG_RUNTIME_DIR is invalid or not set in the environment.
error: can't connect to X server!
libva info: VA-API version 1.18.0
libva info: Trying to open /usr/lib64/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_18
amdgpu: amdgpu_bo_cpu_map failed. (-13)
radeonsi: can't create radeon_winsys_ctx
radeonsi: Failed to create a context.
amdgpu: amdgpu_bo_cpu_map failed. (-13)
radeonsi: can't create radeon_winsys_ctx
radeonsi: Failed to create a context.
libva error: /usr/lib64/dri/radeonsi_drv_video.so init failed
libva info: va_openDriver() returns 2
vaInitialize failed with error code 2 (resource allocation failed),exit
Trying display: wayland
Trying display: x11
Trying display: drm

Describe the results you expected

error: XDG_RUNTIME_DIR is invalid or not set in the environment.
error: can't connect to X server!
libva info: VA-API version 1.18.0
libva info: Trying to open /usr/lib64/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_18
libva info: va_openDriver() returns 0
Trying display: wayland
Trying display: x11
Trying display: drm
vainfo: VA-API version: 1.18 (libva 2.18.2)
vainfo: Driver version: Mesa Gallium driver 23.0.3 for AMD Radeon Graphics (renoir, LLVM 16.0.1, DRM 3.49, 6.2.13-300.fc38.x86_64)
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileVP9Profile0            : VAEntrypointVLD
      VAProfileVP9Profile2            : VAEntrypointVLD
      VAProfileNone                   : VAEntrypointVideoProc

podman info output

host:
  arch: amd64
  buildahVersion: 1.30.0
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-2.fc38.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 94.79
    systemPercent: 1.52
    userPercent: 3.69
  cpus: 12
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: iot
    version: "38"
  eventLogger: journald
  hostname: atlas
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
  kernel: 6.2.13-300.fc38.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 4925906944
  memTotal: 32981196800
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.8.4-1.fc38.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.4
      commit: 5a8fa99a5e41facba2eda4af12fa26313918805b
      rundir: /run/user/1001/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1001/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-12.fc38.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 8589930496
  swapTotal: 8589930496
  uptime: 2h 52m 22.00s (Approximately 0.08 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/home/wireguard-containers/.config/containers/storage.conf
  containerStore:
    number: 52
    paused: 0
    running: 7
    stopped: 45
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/wireguard-containers/.local/share/containers/storage
  graphRootAllocated: 415445573632
  graphRootUsed: 43316490240
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 40
  runRoot: /tmp/containers-user-1001/containers
  transientStore: false
  volumePath: /home/wireguard-containers/.local/share/containers/storage/volumes
version:
  APIVersion: 4.5.0
  Built: 1681486942
  BuiltTime: Fri Apr 14 23:42:22 2023
  GitCommit: ""
  GoVersion: go1.20.2
  Os: linux
  OsArch: linux/amd64
  Version: 4.5.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Running locally on bare metal, in Fedora 38 (IoT edition)

Additional information

running with setenforce 0 also works.

I have limited understanding of strace, but I've run:

podman container run --cap-add=SYS_PTRACE --security-opt unmask=/sys/dev/char --security-opt unmask=/sys/devices --group-add keep-groups --device /dev/dri/renderD128:/dev/dri/renderD128 testbench strace vainfo 2> strace-out

and have: strace-out.txt

rhatdan commented 1 year ago

What AVCs are you seeing?

sudo ausearch -m avc -ts recent

rhatdan commented 1 year ago

error: XDG_RUNTIME_DIR is invalid or not set in the environment.

Indicates you logged in as root and then su'd to a non root user.

# machinectl shell USERNAME@

Will setup a proper session, like you logged in directly.

alexmaras commented 1 year ago

For ausearch, I get:

----
time->Sun May  7 22:08:07 2023
type=AVC msg=audit(1683468487.052:3337): avc:  denied  { map } for  pid=699282 comm="vainfo" path="/dev/dri/renderD128" dev="devtmpfs" ino=947 scontext=system_u:system_r:container_t:s0:c260,c628 tcontext=system_u:object_r:dri_device_t:s0 tclass=chr_file permissive=0
----
time->Sun May  7 22:08:07 2023
type=AVC msg=audit(1683468487.055:3338): avc:  denied  { map } for  pid=699282 comm="vainfo" path="/dev/dri/renderD128" dev="devtmpfs" ino=947 scontext=system_u:system_r:container_t:s0:c260,c628 tcontext=system_u:object_r:dri_device_t:s0 tclass=chr_file permissive=0

The XDG_RUNTIME_DIR is happening in the container where I am running as "root" in rootless podman. machinectl doesn't exist as a command in the container or on the host. The issue is also happening in another container (jellyfin) - I'm just using this very basic Dockerfile as an example setup. Do you expect setting XDG_RUNTIME_DIR in the container to change anything? I see the same error in the output where vainfo does work, and I've been able to confirm that hardware encoding works perfectly in the container once vainfo responds correctly by disabling labeling.

I also just found the section in the man page mentioning sudo setsebool -P container_use_devices=true. I tried this just in case, and there's been no change. Same output on ausearch.

Also, for reference, this is integrated graphics on an amd ryzen 4650G. Not sure if that's related.

gabriel-speziali commented 1 year ago

I think I am having a similar issue within my Plex container (when attempting the Plex Transcoder process from within the container). I am going to add the report here, but if it is a separate issue I can move out.

Issue Description

Steps to reproduce the issue

Here is a copy of my Container file (running as a systemd service, but issue persists when using podman run)

[Container]
Image=docker.io/plexinc/pms-docker:latest
Label=io.containers.autoupdate=registry

Timezone=local

Environment=ADVERTISE_IP=(redacted, not relevant) 

# PUID/PGID set to match host user, keeping userns for file system permissions 
Environment=PLEX_UID=1000
Environment=PLEX_GID=1000
UserNS=keep-id

# Start container as root (needed for s6-init)
User=0:0

# Config volumes
Volume=plex-config:/config:Z
Volume=plex-transcode:/transcode:Z

 # Add video device
AddDevice=/dev/dri/:/dev/dri/:rwm

[Install]
# Start by default on boot
WantedBy=default.target

Describe the results you received

When attempting a transcode, hardware transcoding fails, falls back to software

Results of ausearch:

type=AVC msg=audit(1683913468.909:656): avc: denied { map } for pid=64216 comm=506C6578205472616E73636F646572 path="/dev/dri/renderD128" dev="devtmpfs" ino=497 scontext=system_u:system_r:container_t:s0:c324,c701 tcontext=system_u:object_r:dri_device_t:s0 tclass=chr_file permissive=0

Results of audit2allow with that AVC passed in:

#============= container_t ============== allow container_t dri_device_t:chr_file map;

Describe the results you expected

I am expecting that enabling the container_use_devices boolean would provide the necessary permission

podman info --debug output

host:
  arch: amd64
  buildahVersion: 1.30.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-2.fc38.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 95.96
    systemPercent: 0.99
    userPercent: 3.06
  cpus: 12
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: workstation
    version: "38"
  eventLogger: journald
  hostname: (redacted)
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
  kernel: 6.2.14-300.fc38.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 3315601408
  memTotal: 16539652096
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.8.4-1.fc38.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.4
      commit: 5a8fa99a5e41facba2eda4af12fa26313918805b
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-12.fc38.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 8589930496
  swapTotal: 8589930496
  uptime: 2h 23m 40.00s (Approximately 0.08 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/(user)/.config/containers/storage.conf
  containerStore:
    number: 3
    paused: 0
    running: 3
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/(user)/.local/share/containers/storage
  graphRootAllocated: 510389125120
  graphRootUsed: 201725337600
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 3
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/(user)/.local/share/containers/storage/volumes
version:
  APIVersion: 4.5.0
  Built: 1681486942
  BuiltTime: Fri Apr 14 11:42:22 2023
  GitCommit: ""
  GoVersion: go1.20.2
  Os: linux
  OsArch: linux/amd64
  Version: 4.5.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Running locally, Fedora Workstation

Additional Info

Specific to the Plex image:

rhatdan commented 1 year ago

Fixed in https://github.com/containers/container-selinux/releases/tag/v2.213.0