containers / buildah

A tool that facilitates building OCI images.
https://buildah.io
Apache License 2.0
7.44k stars 785 forks source link

Rootless mode with host root on overlayfs #1831

Closed villytiger closed 4 years ago

villytiger commented 5 years ago

Description

I'm trying to run buildah in rootless mode inside a container. So I have overlay mount as a rootfs. When I run buildah in root mode it reports that "'overlay' is not supported over overlayfs". However when I run it in rootless mode it silently uses some mix of overlay and vfs drivers. As a result I have an empty directory mount as a rootfs for new container.

Steps to reproduce the issue:

  1. sudo podman run -it --rm --privileged fedora:30
  2. yum install -y buildah
  3. useradd -m -s /bin/bash user
  4. chmod u+s /usr/bin/newuidmap /usr/bin/newgidmap
  5. su user
  6. container=$(buildah from fedora:30)
  7. buildah run $container sh
  8. buildah unshare
  9. path=$(buildah mount $container)
  10. ls $path

Describe the results you received:

# buildah run $container sh
ERRO[0000] container_linux.go:346: starting container process caused "exec: \"sh\": executable file not found in $PATH" 
container_linux.go:346: starting container process caused "exec: \"sh\": executable file not found in $PATH"
error running container: error creating container for [sh]: : exit status 1
error while running runtime: exit status 1

# ls $path
dev  etc  proc  run  sys

# buildah run --debug $container sh                          
DEBU[0000] [graphdriver] trying provided driver "overlay" 
DEBU[0000] overlay: mount_program=/usr/bin/fuse-overlayfs 
DEBU[0000] backingFs=overlayfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=false 
DEBU[0000] using "/tmp/buildah498575647" to hold bundle data 
DEBU[0000] Forcing use of an IPC namespace.             
DEBU[0000] Forcing use of a PID namespace.              
DEBU[0000] Forcing use of a user namespace.             
DEBU[0000] Resources: &buildah.CommonBuildOptions{AddHost:[]string{}, CgroupParent:"", CPUPeriod:0x0, CPUQuota:0, CPUShares:0x0, CPUSetCPUs:"", CP
USetMems:"", HTTPProxy:true, Memory:0, DNSSearch:[]string{}, DNSServers:[]string{}, DNSOptions:[]string{}, MemorySwap:0, LabelOpts:[]string(nil), 
SeccompProfilePath:"/usr/share/containers/seccomp.json", ApparmorProfile:"", ShmSize:"65536k", Ulimit:[]string{"nofile=1048576:1048576", "nproc=10
48576:1048576"}, Volumes:[]string{}}  
DEBU[0000] stdio is a terminal, defaulting to using a terminal 
DEBU[0000] ensuring working directory "/home/user/.local/share/containers/storage/overlay/d8a8ca0a52811a1f9b1afd23ccb85dd258508ce8d67b94af761b7179
88f373b7/merged" exists 
DEBU[0000] adding slirp4netns 10.0.2.3 built-in DNS server 
DEBU[0000] /etc/system-fips does not exist on host, not mounting FIPS mode secret 
DEBU[0000] bind mounted "/home/user/.local/share/containers/storage/overlay/d8a8ca0a52811a1f9b1afd23ccb85dd258508ce8d67b94af761b717988f373b7/merge
d" to "/tmp/buildah498575647/mnt/rootfs" 
DEBU[0000] bind mounted "/home/user/.local/share/containers/storage/overlay-containers/99913cd4e10fb8baf8ffb42002fba75996e903cf4923ae16d3bafe47473
b78db/userdata/run/secrets" to "/tmp/buildah498575647/mnt/buildah-bind-target-6"

Describe the results you expected:

I don't understand why overlay driver is not supported over overlayfs. I couldn't find any information about it. Is it also applicable to fuse-overlayfs?

If overlay driver can be used in my scenario, I'd expect buildah to use it. If it can't be used, I'd expect buildah to report about it and stop execution.

Output of rpm -q buildah or apt list buildah:

buildah-1.10.1-2.git8c1c2c5.fc30.x86_64

Output of buildah version:

Version:         1.10.1
Go Version:      go1.12.7
Image Spec:      1.0.1
Runtime Spec:    1.0.1-dev
CNI Spec:        0.4.0
libcni Version:  
Git Commit:      
Built:           Thu Jan  1 00:00:00 1970
OS/Arch:         linux/amd64

*Output of `cat /etc/release`:**

Fedora release 30 (Thirty)
NAME=Fedora
VERSION="30 (Container Image)"
ID=fedora
VERSION_ID=30
VERSION_CODENAME=""
PLATFORM_ID="platform:f30"
PRETTY_NAME="Fedora 30 (Container Image)"
ANSI_COLOR="0;34"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:30"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f30/system-administrators-guide/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=30
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=30
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Container Image"
VARIANT_ID=container
Fedora release 30 (Thirty)
Fedora release 30 (Thirty)

Output of uname -a:

Linux test 5.0.0-25-generic #26~18.04.1-Ubuntu SMP Thu Aug 1 13:51:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Output of cat /etc/containers/storage.conf:

# This file is is the configuration file for all tools
# that use the containers/storage library.
# See man 5 containers-storage.conf for more information
# The "container storage" table contains all of the server options.
[storage]

# Default Storage Driver
driver = "overlay"

# Temporary storage location
runroot = "/var/run/containers/storage"

# Primary Read/Write location of container storage
graphroot = "/var/lib/containers/storage"

[storage.options]
# Storage options to be passed to underlying storage drivers

# AdditionalImageStores is used to pass paths to additional Read/Only image stores
# Must be comma separated list.
additionalimagestores = [
]

# Size is used to set a maximum size of the container image.  Only supported by
# certain container storage drivers.
size = ""

# Path to an helper program to use for mounting the file system instead of mounting it
# directly.
#mount_program = "/usr/bin/fuse-overlayfs"

# OverrideKernelCheck tells the driver to ignore kernel checks based on kernel version
override_kernel_check = "true"

# mountopt specifies comma separated list of extra mount options
mountopt = "nodev,metacopy=on"

# Remap-UIDs/GIDs is the mapping from UIDs/GIDs as they should appear inside of
# a container, to UIDs/GIDs as they should appear outside of the container, and
# the length of the range of UIDs/GIDs.  Additional mapped sets can be listed
# and will be heeded by libraries, but there are limits to the number of
# mappings which the kernel will allow when you later attempt to run a
# container.
#
# remap-uids = 0:1668442479:65536
# remap-gids = 0:1668442479:65536

# Remap-User/Group is a name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid or /etc/subgid file.  Mappings are set up starting
# with an in-container ID of 0 and the a host-level ID taken from the lowest
# range that matches the specified name, and using the length of that range.
# Additional ranges are then assigned, using the ranges which specify the
# lowest host-level IDs first, to the lowest not-yet-mapped container-level ID,
# until all of the entries have been used for maps.
#
# remap-user = "storage"
# remap-group = "storage"

[storage.options.thinpool]
# Storage Options for thinpool

# autoextend_percent determines the amount by which pool needs to be
# grown. This is specified in terms of % of pool size. So a value of 20 means
# that when threshold is hit, pool will be grown by 20% of existing
# pool size.
# autoextend_percent = "20"

# autoextend_threshold determines the pool extension threshold in terms
# of percentage of pool size. For example, if threshold is 60, that means when
# pool is 60% full, threshold has been hit.
# autoextend_threshold = "80"

# basesize specifies the size to use when creating the base device, which
# limits the size of images and containers.
# basesize = "10G"

# blocksize specifies a custom blocksize to use for the thin pool.
# blocksize="64k"

# directlvm_device specifies a custom block storage device to use for the
# thin pool. Required if you setup devicemapper.
# directlvm_device = ""

# directlvm_device_force wipes device even if device already has a filesystem.
# directlvm_device_force = "True"

# fs specifies the filesystem type to use for the base device.
# fs="xfs"

# log_level sets the log level of devicemapper.
# 0: LogLevelSuppress 0 (Default)
# 2: LogLevelFatal
# 3: LogLevelErr
# 4: LogLevelWarn
# 5: LogLevelNotice
# 6: LogLevelInfo
# 7: LogLevelDebug
# log_level = "7"

# min_free_space specifies the min free space percent in a thin pool require for
# new device creation to succeed. Valid values are from 0% - 99%.
# Value 0% disables
# min_free_space = "10%"

# mkfsarg specifies extra mkfs arguments to be used when creating the base.
# device.
# mkfsarg = ""

# use_deferred_removal marks devicemapper block device for deferred removal.
# If the thinpool is in use when the driver attempts to remove it, the driver
# tells the kernel to remove it as soon as possible. Note this does not free
# up the disk space, use deferred deletion to fully remove the thinpool.
# use_deferred_removal = "True"

# use_deferred_deletion marks thinpool device for deferred deletion.
# If the device is busy when the driver attempts to delete it, the driver
# will attempt to delete device every 30 seconds until successful.
# If the program using the driver exits, the driver will continue attempting
# to cleanup the next time the driver is used. Deferred deletion permanently
# deletes the device and all data stored in device will be lost.
# use_deferred_deletion = "True"

# xfs_nospace_max_retries specifies the maximum number of retries XFS should
# attempt to complete IO when ENOSPC (no space) error is returned by
# underlying storage device.
# xfs_nospace_max_retries = "0"

# If specified, use OSTree to deduplicate files with the overlay backend
ostree_repo = ""

# Set to skip a PRIVATE bind mount on the storage home directory.  Only supported by
# certain container storage drivers
skip_mount_home = "false"

Output of cat ~/.config/containers/storage.conf:

[storage]
  driver = "overlay"
  runroot = "/tmp/1994"
  graphroot = "/home/user/.local/share/containers/storage"
  [storage.options]
    size = ""
    remap-uids = ""
    remap-gids = ""
    ignore_chown_errors = ""
    remap-user = ""
    remap-group = ""
    ostree_repo = ""
    skip_mount_home = ""
    mount_program = "/usr/bin/fuse-overlayfs"
    mountopt = ""
    [storage.options.thinpool]
      autoextend_percent = ""
      autoextend_threshold = ""
      basesize = ""
      blocksize = ""
      directlvm_device = ""
      directlvm_device_force = ""
      fs = ""
      log_level = ""
      min_free_space = ""
      mkfsarg = ""
      mountopt = ""
      use_deferred_deletion = ""
      use_deferred_removal = ""
      xfs_nospace_max_retries = ""
villytiger commented 5 years ago

By the way. If I specify storage-driver command line option, buildah reports error:

# buildah run --storage-driver overlay $container sh
ERRO[0000] 'overlay' is not supported over overlayfs    
'overlay' is not supported over overlayfs: backing file system is unsupported for this graph driver
rhatdan commented 5 years ago

You have to volume mount in /var/lib/containers if you want to use overlay inside of a container running on overlay.

In rootless mode we are not using the kernels overlayfs we are using fuse-overlayfs, so it is allowed in this case.

villytiger commented 5 years ago

In rootless mode we are not using the kernels overlayfs we are using fuse-overlayfs, so it is allowed in this case.

But it doesn't work in my case. If I run mkdir -p /home/user/.local/share/containers && chown -R user:user /home/user/.local && mount -t tmpfs tmpfs /home/user/.local/share/containers before step 5, it does work.

villytiger commented 5 years ago

It is a side question. But I wonder why nested overlay is not allowed?

rhatdan commented 5 years ago

You'll have to ask the kernel guys.

rhatdan commented 5 years ago

I am talking about in the podman command.

mkdir ./myuser chown myuser/ podman run -v ./myuser:/home/user ...

villytiger commented 5 years ago

Yes, it must work. But this bug report about overlay mount in /home/user. Buildah has inadequate behavior in such case trying to mount an empty directory with hard to debug message about missing /bin/sh file. I see two options here:

  1. Buildah must report in rootless mode that it doesn't support fuse-overlayfs over overlayfs.
  2. If Buildah does support fuse-overlayfs over overlayfs, it must properly use it instead of mounting empty directory.
rhatdan commented 5 years ago

I don't believe that happens with the latest fuse-overlayfs. Fuse-overlay will work on an Overlayfs file system, I believe. @giuseppe PTAL

giuseppe commented 5 years ago

native overlay cannot be nested as the whiteouts will confuse the different overlay layers.

fuse-overlayfs can be used on top of overlay. fuse-overlayfs will fall back to use .wh.FILE whiteouts when whiteouts using mknod cannot be used.

Please be aware though that this use case is not really well tested so I'd still suggest to use a bind mount so that the storage inside the container is not on overlay.

rhatdan commented 4 years ago

I don't believe we have this issue any longer, reopen if I am mistaken.