containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.27k stars 2.37k forks source link

podman buildx - basic test fails under rootless w/ CgroupsV1 #17520

Closed cevich closed 1 year ago

cevich commented 1 year ago

Issue Description

Using Debian SID to run podman's integration tests this test is failing because the volume device numbers are (unexpectedly) all 0x801. Running the test manually via hack/bats reproduces the same results.

Steps to reproduce the issue

Steps to reproduce the issue

  1. On a Debian SID VM (hack/get_ci_vm.sh sys podman debian-12 rootless host)
  2. As a rootless user
  3. Execute hack/bats 070-build

Describe the results you received

070-build.bats
 ✓ podman build - basic test
 ✗ podman buildx - basic test
   (from function `assert' in file test/system/helpers.bash, line 643,
    in test file test/system/070-build.bats, line 76)
     `assert "${lines[0]}" != "${lines[3]}" "devnum( / ) != devnum( volume0 )"' failed
   $ /var/tmp/go/src/github.com/containers/podman/bin/podman rm -t 0 --all --force --ignore
   $ /var/tmp/go/src/github.com/containers/podman/bin/podman ps --all --external --format {{.ID}} {{.Names}}
   $ /var/tmp/go/src/github.com/containers/podman/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
   quay.io/libpod/testimage:20221018 f5a99120db64
   $ /var/tmp/go/src/github.com/containers/podman/bin/podman info --format {{ .Host.BuildahVersion}}
   1.30.0-dev
   $ /var/tmp/go/src/github.com/containers/podman/bin/podman buildx version
   buildah 1.30.0-dev
   $ /var/tmp/go/src/github.com/containers/podman/bin/podman buildx build --load -t build_test --format=docker /tmp/podman_bats.iehrQX/build-test
   STEP 1/4: FROM quay.io/libpod/testimage:20221018
   STEP 2/4: RUN echo 9GJywjZwcLCa0cZkrq3aquRKm1YZ8xkbYZDIDNyti8LdoU6U79 > /OKmiWmUjnhU67J0ImJLD
   --> 438d369d3e5
   STEP 3/4: VOLUME /a/b/c
   --> 16e0156f7ce
   STEP 4/4: VOLUME ['/etc/foo', '/etc/bar']
   COMMIT build_test
   --> 47b8747c854
   Successfully tagged localhost/build_test:latest
   47b8747c85431df1b7e1e217156da56f3070d88c3d98ac126e4c0a81ebf9438a
   $ /var/tmp/go/src/github.com/containers/podman/bin/podman run --rm build_test cat /OKmiWmUjnhU67J0ImJLD
   9GJywjZwcLCa0cZkrq3aquRKm1YZ8xkbYZDIDNyti8LdoU6U79
   $ /var/tmp/go/src/github.com/containers/podman/bin/podman run --rm build_test find /[ /etc/bar] -print
   /[
   /[/etc
   /[/etc/foo,
   /etc/bar]
   $ /var/tmp/go/src/github.com/containers/podman/bin/podman run --rm build_test stat -c %D / /a /a/b /a/b/c /[ /[/etc /[/etc/foo, /etc /etc/bar]
   801
   801
   801
   801
   801
   801
   801
   801
   801
   #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
   #|     FAIL: devnum( / ) != devnum( volume0 )
   #| expected: != '801'
   #|   actual:    '801'
   #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Describe the results you expected

Test should pass

podman info output

host:
  arch: amd64
  buildahVersion: 1.30.0-dev
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon_2.1.6+ds1-1_amd64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.6, commit: unknown'
  cpuUtilization:
    idlePercent: 59.58
    systemPercent: 13.53
    userPercent: 26.89
  cpus: 2
  distribution:
    codename: bookworm
    distribution: debian
    version: "12.03"
  eventLogger: journald
  hostname: cirrus-task-4917102335754240
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.1.0-4-cloud-amd64
  linkmode: dynamic
  logDriver: journald
  memFree: 3132043264
  memTotal: 4116938752
  networkBackend: netavark
  ociRuntime:
    name: runc
    package: runc_1.1.4+ds1-1+b2_amd64
    path: /usr/bin/runc
    version: |-
      runc version 1.1.4+ds1
      commit: 1.1.4+ds1-1+b2
      spec: 1.0.2-dev
      go: go1.19.5
      libseccomp: 2.5.4
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: true
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns_1.2.0-1_amd64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 0
  swapTotal: 0
  uptime: 0h 27m 49.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  docker.io:
    Blocked: false
    Insecure: false
    Location: mirror.gcr.io
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: docker.io
    PullFromMirror: ""
  docker.io/library:
    Blocked: false
    Insecure: false
    Location: quay.io/libpod
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: docker.io/library
    PullFromMirror: ""
  localhost:5000:
    Blocked: false
    Insecure: true
    Location: localhost:5000
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: localhost:5000
    PullFromMirror: ""
  search:
  - docker.io
  - quay.io
  - registry.fedoraproject.org
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 211116445696
  graphRootUsed: 4849213440
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.5.0-dev
  Built: 1676489753
  BuiltTime: Wed Feb 15 19:35:53 2023
  GitCommit: 4aec13ff231c74ba2eff59c172dd6ebd341adaf5
  GoVersion: go1.19.5
  Os: linux
  OsArch: linux/amd64
  Version: 4.5.0-dev

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Debian GNU/Linux bookworm/sid \n \l

Kernel: 6.1.0-4-cloud-amd64 Cgroups: tmpfs dpkg-query: no packages found matching containers-common dpkg-query: no packages found matching cri-o-runc conmon-2.1.6+ds1-1-amd64 containernetworking-plugins-1.1.1+ds1-3+b2-amd64 criu-3.17.1-2-amd64 crun-1.8-1-amd64 golang-2:1.19~1-amd64 libseccomp2-2.5.4-1+b3-amd64 podman-4.3.1+ds1-5+b2-amd64 runc-1.1.4+ds1-1+b2-amd64 skopeo-1.9.3+ds1-1+b1-amd64 slirp4netns-1.2.0-1-amd64

Additional information

Logs from a run in CI showing the same failure & error.

cevich commented 1 year ago

@giuseppe PTAL, is this expected behavior for CGv1/runc with rootless podman?

giuseppe commented 1 year ago

I don't think that depends from cgroupv1 but rather on the storage driver.

I'd expect the same output when the graphroot is vfs.

We should skip the test when $(podman --storage-driver vfs info --format "{{.Store.GraphDriverName}}") is vfs

Can you please check if it is using vfs? It is not clear from the podman info output above since that is for root. If it is vfs, could we install fuse-overlayfs?

cevich commented 1 year ago

If it is vfs, could we install fuse-overlayfs?

Ahh, you might be onto something there. It's not explicitly called out in the image-build package install list, I'll get a VM and see if it's there or not.

cevich commented 1 year ago

...well damn, fuse-overlayfs is there. The problem reproduces under hack/get_ci_vm.sh so let me do that and see if as the test-user, it's running VFS for some reason.

cevich commented 1 year ago

Ahh, @giuseppe called it. As the rootless user, podman info shows:

...cut...
store:
  configFile: /home/some13481dude/.config/containers/storage.conf
...cut...
  graphDriverName: vfs
  graphOptions: {}
  graphRoot: /home/some13481dude/.local/share/containers/storage
  graphRootAllocated: 211116445696
  graphRootUsed: 5158154240
...cut...

Oddly enough, there is no user or system storage.conf specifying VFS. So it must be set by some other means. I checked and /usr/bin/fuse-overlayfs is definitely there. Hmmmm.

cevich commented 1 year ago

@nalind or @mtrmac either of you have any idea why these new Debian VMs would be selecting VFS for storage by default?

cevich commented 1 year ago

Update: On my Debian VM as root, podman info shows graphDriverName: overlay. But I just made a brand-new user (one that's never run any tests or podman-anything). It's info output also shows graphDriverName: vfs.

mtrmac commented 1 year ago

I know absolutely nothing about that problem space, so this is probably not very helpful:

I’d go looking for data about how the driver decision was made in podman --log-level=debug (maybe only on the initial Podman run in that environment, before it records state?), and if that debug log doesn’t contain the data, I’d suggest that it would be worth adding there.

At least code inspection suggests log entries like

"[graphdriver] trying provided driver %q"
"[graphdriver] using prior storage driver: %s"

(but there seem to be no log entries about the logic actually choosing a driver when none is specified, based on the user-provided or built-in priority list — and, worse, if a driver is on the priority list but its initialization fails, that error is AFAICS not logged. At least the latter part seems quite useful for debugging the behavior).

cevich commented 1 year ago

Sorry @mtrmac I thought this was part of your wheel-house. I appreciate the debug suggestion though, I'll give that a try and see where it takes me. It seems likely something in the environment is causing it, so maybe that'll show up in the output.

cevich commented 1 year ago

Update: debug level output from running podman info as the rootless user:

$ bin/podman --log-level=debug info
INFO[0000] bin/podman filtering at log level debug
DEBU[0000] Called info.PersistentPreRunE(bin/podman --log-level=debug info)
DEBU[0000] Using conmon: "/usr/bin/conmon"
DEBU[0000] Initializing boltdb state at /home/some11319dude/.local/share/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver vfs
...cut...
cevich commented 1 year ago

Mystery solved:

According to @nalind VFS is the hard-wired default for rootless if nothing else is selected. On Fedora, the containers-common package supplies a /usr/share/containers/storage.conf which has driver=overlay set. On Debian SID there is no such file or package, so the user gets the default VFS.

cevich commented 1 year ago

@siretart I'm not sure what the Debian policy / common practice is here. The podman "experience" with the VFS storage driver is sub-optimal. It's definitely the "safe" choice, but will leave new rootless users with a "podman is slow" taste in their mouth. It also differs from the built-in default (overlay) for root - again it's a safe choice.

In Fedora there's a containers-common package that places a default /usr/share/containers/storage.conf file. That file sets (condensed / among other options):

[storage]
driver = "overlay"

[storage.options.overlay]
mountopt = "nodev,metacopy=on"

I just verified manually, having this on Debian results in new rootless users getting the overlay driver by "default". Is there a containers-common package or similar mechanism for Debian that would provide best new (rootless) user experience WRT storage driver?

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.