containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.27k stars 2.37k forks source link

mount_program not autodetected when using --root #13459

Closed Vogtinator closed 2 years ago

Vogtinator commented 2 years ago

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

When using the --root option, it does no longer autodetect fuse-overlayfs as overlay.mount_program. It has to be specified explicitly on the commandline to make podman start again.

Both this and https://github.com/containers/podman/issues/13458 are about mount_program not getting autodetected properly with certain setups. In fact, I hit #13458 while debugging this one.

Steps to reproduce the issue:

fabian@localhost:~> ./podman/bin/podman info -f json | jq '.["store"]'
{
  "configFile": "/home/fabian/.config/containers/storage.conf",
  "containerStore": {
    "number": 0,
    "paused": 0,
    "running": 0,
    "stopped": 0
  },
  "graphDriverName": "overlay",
  "graphOptions": {
    "overlay.mount_program": {
      "Executable": "/usr/bin/fuse-overlayfs",
      "Package": "fuse-overlayfs-1.1.2-3.9.1.x86_64",
      "Version": "fuse-overlayfs: version 1.1.0\nFUSE library version 3.6.1\nusing FUSE kernel interface version 7.29"
    }
  },
  "graphRoot": "/home/fabian/.local/share/containers/storage",
  "graphStatus": {
    "Backing Filesystem": "btrfs",
    "Native Overlay Diff": "false",
    "Supports d_type": "true",
    "Using metacopy": "false"
  },
  "imageCopyTmpDir": "/var/tmp",
  "imageStore": {
    "number": 0
  },
  "runRoot": "/run/user/1000/containers",
  "volumePath": "/home/fabian/.local/share/containers/storage/volumes"
}
fabian@localhost:~> ./podman/bin/podman --root .local/share/containers info -f json | jq '.["store"]'
Error: kernel does not support overlay fs: unable to create kernel-style whiteout: operation not permitted

Describe the results you received:

fabian@localhost:~> ./podman/bin/podman --root .local/share/containers --log-level debug info
INFO[0000] ./podman/bin/podman filtering at log level debug 
DEBU[0000] Called info.PersistentPreRunE(./podman/bin/podman --root .local/share/containers --log-level debug info) 
DEBU[0000] overlay: storage already configured with a mount-program 
DEBU[0000] Merged system config "/usr/share/containers/containers.conf" 
DEBU[0000] overlay: storage already configured with a mount-program 
DEBU[0000] overlay: storage already configured with a mount-program 
DEBU[0000] Using conmon: "/usr/bin/conmon"              
DEBU[0000] Initializing boltdb state at .local/share/containers/libpod/bolt_state.db 
DEBU[0000] Using graph driver overlay                   
DEBU[0000] Using graph root .local/share/containers     
DEBU[0000] Using run root /run/user/1000/containers     
DEBU[0000] Using static dir .local/share/containers/libpod 
DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp      
DEBU[0000] Using volume path .local/share/containers/volumes 
DEBU[0000] overlay: storage already configured with a mount-program 
DEBU[0000] Set libpod namespace to ""                   
DEBU[0000] [graphdriver] trying provided driver "overlay" 
DEBU[0000] Cached value indicated that overlay is not supported 
Error: kernel does not support overlay fs: unable to create kernel-style whiteout: operation not permitted

Describe the results you expected:

It should continue autodetecting fuse-overlayfs.

Additional information you deem important (e.g. issue happens only occasionally):

The man page explains that if the --root option is given, options from /etc/containers/storage.conf are ignored:

Overriding this option will cause the storage-opt settings in /etc/containers/storage.conf to be ignored.  The user must specify additional options via the --storage-opt flag.

overlay.mount_program is not specified in that file, it is internally set by https://github.com/containers/podman/blob/f33b64d8b7d7b2bd22560cfacc90e25d1f9e16b4/vendor/github.com/containers/storage/types/options.go#L217-L228.

It is then thrown out again by https://github.com/containers/podman/blob/f33b64d8b7d7b2bd22560cfacc90e25d1f9e16b4/pkg/domain/infra/runtime_libpod.go#L132

Reverting 55f00bac02fcde7fbe960a9a80131dbc72630b5b fixes this issue.

BTW: That commit was incorrectly documented in the release notes. They say:

https://github.com/containers/podman/blob/f33b64d8b7d7b2bd22560cfacc90e25d1f9e16b4/RELEASE_NOTES.md?plain=1#L432

Maybe that should've been s/not/now/?

Output of podman version:

Client:       Podman Engine
Version:      4.0.0-dev
API Version:  4.0.0-dev
Go Version:   go1.17.7
Git Commit:   4a242b1327fb34e6cac6c1686afb3370901180d3
Built:        Tue Mar  8 16:21:46 2022
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.24.2
  cgroupControllers: []
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.30-150300.8.3.1.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.30, commit: unknown'
  cpus: 2
  distribution:
    distribution: '"sles"'
    version: "15.3"
  eventLogger: file
  hostname: localhost
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 100
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.3.18-150300.59.49-default
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 995549184
  memTotal: 3837120512
  networkBackend: cni
  ociRuntime:
    name: runc
    package: runc-1.0.3-27.1.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.3
      spec: 1.0.2-dev
      go: go1.16.10
      libseccomp: 2.5.3
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_MKNOD,CAP_NET_BIND_SERVICE,CAP_NET_RAW,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-0.4.7-3.12.1.x86_64
    version: |-
      slirp4netns version 0.4.7
      commit: unknown
      libslirp: 4.3.1-git
  swapFree: 4114313216
  swapTotal: 4117733376
  uptime: 8h 0m 26.89s (Approximately 0.33 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.opensuse.org
  - docker.io
store:
  configFile: /home/fabian/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.1.2-3.9.1.x86_64
      Version: |-
        fuse-overlayfs: version 1.1.0
        FUSE library version 3.6.1
        using FUSE kernel interface version 7.29
  graphRoot: /home/fabian/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/user/1000/containers
  volumePath: /home/fabian/.local/share/containers/storage/volumes
version:
  APIVersion: 4.0.0-dev
  Built: 1646752906
  BuiltTime: Tue Mar  8 16:21:46 2022
  GitCommit: 4a242b1327fb34e6cac6c1686afb3370901180d3
  GoVersion: go1.17.7
  OsArch: linux/amd64
  Version: 4.0.0-dev

Package info (e.g. output of rpm -q podman or apt list podman):

Bulid from git.

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

mheon commented 2 years ago

I don't think we intend on reverting that patch, as the logic behind its inclusion seems solid. Do you have a suggested alternative?

dcermak commented 2 years ago

I would suggest that podman still tries to autodect the mount program even with the --root cli flag present, as the current behavior completely breaks user expectations: You don't apply any special settings to podman and just by using --root it breaks, if your kernel is old enough.

Vogtinator commented 2 years ago

I don't think we intend on reverting that patch, as the logic behind its inclusion seems solid.

Agreed, it's mainly that mount_program is a a special case. Are there other storage options for which podman has an internal default?

Do you have a suggested alternative?

I see several options:

  1. Move mount_program autodetection to the overlay storage driver. If it's called in rootless mode and mount_program isn't set, it could set a default itself. That would also fix #13458.
  2. Explicitly not reset mount_program if --root was given.
  3. Perform mount_program detection after argument parsing
rhatdan commented 2 years ago

@giuseppe WDYT?

giuseppe commented 2 years ago

1 and 3 seem to me like a good compromise. Anyone would like to open a PR to implement it?

vrothberg commented 2 years ago

Interested in opening a PR, @Vogtinator ?

dcermak commented 2 years ago

EDIT: Turns out I've had a brain fart and missed that storageOpts.GraphDriverOptions is initialized as nil and not as []string{}, so we actually get into the branch in which kills our config https://github.com/containers/podman/blob/ae7997ab50c7dcfbf7f0a3ec19c9ce1126095f8e/libpod/options.go#L78-L86

invalid, but left for reference:

I have been digging a bit into this and it turns out that our first hunch that https://github.com/containers/podman/blob/f33b64d8b7d7b2bd22560cfacc90e25d1f9e16b4/pkg/domain/infra/runtime_libpod.go#L129-L133 as the culprit is wrong. I actually believe that line 132 does absolutely nothing, because storageOpts is set just a few lines above to an empty struct and GraphDriverOptions is not touched at all before: https://github.com/containers/podman/blob/ae7997ab50c7dcfbf7f0a3ec19c9ce1126095f8e/pkg/domain/infra/runtime_libpod.go#L103

What is actually important is line 130 storageSet = true as this results in this branch being taken: https://github.com/containers/podman/blob/ae7997ab50c7dcfbf7f0a3ec19c9ce1126095f8e/pkg/domain/infra/runtime_libpod.go#L176-L178

This in turn later calls https://github.com/containers/podman/blob/ae7997ab50c7dcfbf7f0a3ec19c9ce1126095f8e/libpod/options.go#L78-L86 where the else branch is taken and thereby our config is killed.

I have not yet figured out a way how to properly address this as the mount program appears to be only used actively by MountWithOptions: https://github.com/containers/podman/blob/ae7997ab50c7dcfbf7f0a3ec19c9ce1126095f8e/vendor/github.com/containers/buildah/pkg/overlay/overlay.go#L154

dcermak commented 2 years ago

With some pointers from @Vogtinator I've tried this simple patch:

diff --git a/vendor/github.com/containers/storage/drivers/overlay/overlay.go b/vendor/github.com/containers/storage/drivers/overlay/overlay.go
index 739828b35..c4d703924 100644
--- a/vendor/github.com/containers/storage/drivers/overlay/overlay.go
+++ b/vendor/github.com/containers/storage/drivers/overlay/overlay.go
@@ -330,6 +330,12 @@ func Init(home string, options graphdriver.Options) (graphdriver.Driver, error)
        return nil, err
    }

+   if opts.mountProgram == "" {
+       if path, err := exec.LookPath("fuse-overlayfs"); err == nil {
+           opts.mountProgram = path
+       }
+   }
+
    var usingMetacopy bool
    var supportsDType bool
    var supportsVolatile *bool

which is essentially option 1 from https://github.com/containers/podman/issues/13459#issuecomment-1062925677.

This fixes the problem that we were observing previously, but it has a pretty big catch: podman --root /something info still does not mention the mount_program although it is now set internally in the graph driver. This could be confusing for end users, although it should do no harm, as it will only try to autodetect if no mount program was found.

What are your thoughts on this? I have the feeling that mount_program will need special treatment in one way or the other.

giuseppe commented 2 years ago

wouldn't that detect the mountProgram also when native overlay is available?

dcermak commented 2 years ago

@giuseppe I am unfortunately not at all familiar how the native overlay works. Could you point me maybe to some docs how it works in detail, especially how it behaves differently to overlay with FUSE?

Vogtinator commented 2 years ago

@giuseppe I am unfortunately not at all familiar how the native overlay works.

Instead of using FUSE with fuse-overlayfs it uses the kernel's native overlay filesystem by mounting that in a namespace.

https://github.com/containers/podman/blob/68ce83fe919f2d37762b8b746a73495f45e550f3/vendor/github.com/containers/storage/types/options.go#L229 checks whether that works, only if it doesn't it searches for fuse-overlayfs.

dcermak commented 2 years ago

wouldn't that detect the mountProgram also when native overlay is available?

It indeed does, which is not the desired outcome, right?

giuseppe commented 2 years ago

right, otherwise we will end up using fuse-overlayfs even if native overlay is supported by the kernel