containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.83k stars 2.42k forks source link

UDev rule based usb devices are not handled correctly after replugging #24093

Open HolgerHees opened 1 month ago

HolgerHees commented 1 month ago

Issue Description

I recently migrated all my docker containers to podman. There are 34 containers of different types. I was amazed at how easy and straightforward it was.

But one problem remains. I have 2 containers that access devices that are created via a UDev rule. They are USB sticks that create symlinks using a udev rule. These in turn are mounted in the container.

So far it works under podman. The problem under podman is the bind mount, which mounts the inode of the dissolved device.

i.e. after a reboot or when the USB stick is plugged in again, the device is available under a different /dev/USBX. The UDEV rule ensures that it is available under the same symlink. Only the container still points to the old USB device.

This was not a problem under Docker and worked for years.

What is the recommended way to deal with this problem?

Currently, the container has to be recreated, which is absolute overkill. Currently, I only start my 34 containers after a reboot again. Recreating them all on suspicion would slow down the entire startup process extremely and also present me with new problems.

Steps to reproduce the issue

Steps to reproduce the issue

  1. create a udev rule which creates a symlink between e.g. /dev/USB0 and /dev/myCustomLink
  2. mount /dev/myCustomLink to the container
  3. unplug the usb device and plug it again
  4. mounted symlink inside the container should point to /dev/USB1, but ist points to /dev/USB0

Describe the results you received

mounted symlink point to the wrong device

Describe the results you expected

mounted symlink should point to the correct device

podman info output

host:
  arch: amd64
  buildahVersion: 1.33.8
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-150500.9.9.1.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: unknown'
  cpuUtilization:
    idlePercent: 72.12
    systemPercent: 17.55
    userPercent: 10.33
  cpus: 4
  databaseBackend: sqlite
  distribution:
    distribution: opensuse-leap
    version: "15.6"
  eventLogger: journald
  freeLocks: 1857
  hostname: marvin
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.4.0-150600.23.22-default
  linkmode: dynamic
  logDriver: journald
  memFree: 7434743808
  memTotal: 33293668352
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.11.0-150500.3.6.1.x86_64
      path: /usr/lib/podman/aardvark-dns
      version: aardvark-dns 1.11.0
    package: netavark-1.11.0-150500.3.6.1.x86_64
    path: /usr/lib/podman/netavark
    version: netavark 1.11.0
  ociRuntime:
    name: crun
    package: crun-1.8.6-bp156.1.13.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.6
      commit: 73f759f4a39769f60990e7d225f561b4f4f06bcf
      rundir: /run/user/0/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: true
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-150600.1.5.x86_64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 5
      libseccomp: 2.5.3
  swapFree: 34358947840
  swapTotal: 34359734272
  uptime: 34h 8m 20.00s (Approximately 1.42 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.opensuse.org
  - registry.suse.com
  - mirror.gcr.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 34
    paused: 0
    running: 34
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /smartserver/var/lib/containers/storage
  graphRootAllocated: 357225041920
  graphRootUsed: 100487294976
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 73
  runRoot: /smartserver/tmp/containers/storage
  transientStore: false
  volumePath: /smartserver/var/lib/containers/storage/volumes
version:
  APIVersion: 4.9.5
  Built: 1719835200
  BuiltTime: Mon Jul  1 14:00:00 2024
  GitCommit: ""
  GoVersion: go1.21.11
  Os: linux
  OsArch: linux/amd64
  Version: 4.9.5

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

No

Additional environment details

No response

Additional information

No response

rhatdan commented 3 weeks ago

@giuseppe PTAL

giuseppe commented 3 weeks ago

Is your container creating a lot of files on its overlay upper directory? Otherwise recreating the container is not much more expensive than restarting it.

Once resolved, the original symlink does not affect what the container sees. You could try --mount type=bind with the no-dereference option help so can bind mount the symlink itself but in that case, you'd need to bind mount /dev since the symlink is then resolved inside the container.

HolgerHees commented 3 weeks ago

@giuseppe I already "fixed" it by mounting the /dev inside the container. No additional mount or mount type was needed. But this is a security hole in my opinion, because it gives also access to other devices.

the container itself is creating a lot of files, because it is initializing several minutes and is downloading a lot of additional "plugins" based on configs. (https://hub.docker.com/r/openhab/openhab)

the reason why I create this bug ticket is that docker was able to handle this and I guess, your goal is to be as compatible as much as possible.

I ported all of my 31 different containers from docker to podman and this is the only thing which behaves completely differently. The rest had only minor differences where I was able to find a proper replacement.

With podman inspect the related part looks like

"Devices": [
                    {
                         "PathOnHost": "/dev/ttyUSB0",
                         "PathInContainer": "/dev/ttyMyTestDevice",
                         "CgroupPermissions": ""
                    }
               ],

with docker inspect it looks like

 "Mounts": [
            {
                "Type": "bind",
                "Source": "/dev/ttyMyTestDevice",
                "Destination": "/dev/ttyMyTestDevice",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],

inside the docker container it looks like

/ # ls -al /dev/
total 0
drwxr-xr-x    5 root     root           380 Sep 28 15:15 .
drwxr-xr-x    1 root     root           134 Sep 28 15:15 ..
crw--w----    1 root     tty       136,   0 Sep 28 15:15 console
lrwxrwxrwx    1 root     root            11 Sep 28 15:15 core -> /proc/kcore
lrwxrwxrwx    1 root     root            13 Sep 28 15:15 fd -> /proc/self/fd
crw-rw-rw-    1 root     root        1,   7 Sep 28 15:15 full
drwxrwxrwt    2 root     root            40 Sep 28 15:15 mqueue
crw-rw-rw-    1 root     root        1,   3 Sep 28 15:15 null
lrwxrwxrwx    1 root     root             8 Sep 28 15:15 ptmx -> pts/ptmx
drwxr-xr-x    2 root     root             0 Sep 28 15:15 pts
crw-rw-rw-    1 root     root        1,   8 Sep 28 15:15 random
drwxrwxrwt    2 root     root            40 Sep 28 15:15 shm
lrwxrwxrwx    1 root     root            15 Sep 28 15:15 stderr -> /proc/self/fd/2
lrwxrwxrwx    1 root     root            15 Sep 28 15:15 stdin -> /proc/self/fd/0
lrwxrwxrwx    1 root     root            15 Sep 28 15:15 stdout -> /proc/self/fd/1
crw-rw-rw-    1 root     root        5,   0 Sep 28 15:15 tty
crw-rw----    1 root     490       188,   0 Sep 28 15:14 ttyMyTestDevice
crw-rw-rw-    1 root     root        1,   9 Sep 28 15:15 urandom
crw-rw-rw-    1 root     root        1,   5 Sep 28 15:15 zero

on my host it looks like

lrwxrwxrwx 1 root  root         12 28. Sep 17:14 /dev/ttyMyTestDevice -> /dev/ttyUSB0
HolgerHees commented 3 weeks ago

I read somewhere since kernel v5.12+ bind mount supports symlinks with the flag AT_SYMLINK_NOFOLLOW on files.