containers / buildah

A tool that facilitates building OCI images.
https://buildah.io
Apache License 2.0
7.28k stars 766 forks source link

Rootless builds slow on Ubuntu 20.04 LTS #2411

Closed skorhone closed 3 years ago

skorhone commented 4 years ago

Description Builds using both podman and buildah are slow when running as non-root user. System has been configured to use fuse-overlayfs. System shows heavy CPU usage on both podman and fuse-overlayfs processes. IO-activity (according to iostat) is low.

Steps to reproduce the issue:

  1. Build image based on renovate/renovate
  2. Issue ADD command

Dockerfile example: Example is not same as what was used for results, but using this example step COPY is slow.

FROM renovate/renovate USER root COPY config.js /usr/src/app/config.js USER ubuntu

Describe the results you received: With podman some of steps are extremely slow (over a minute):

[2020-06-17 10:06:07] podman build --no-cache -t renovate:some_organization . [2020-06-17 10:06:07] STEP 1: FROM renovate/renovate [2020-06-17 10:09:49] STEP 2: USER root [2020-06-17 10:09:49] --> 3af07d99a94 [2020-06-17 10:09:49] STEP 3: ADD ca-certificates /usr/local/share/ca-certificates [2020-06-17 10:11:34] --> a9c1775048d [2020-06-17 10:11:34] STEP 4: COPY config.js /usr/src/app/config.js [2020-06-17 10:13:17] --> b302c52878e [2020-06-17 10:13:17] STEP 5: COPY scripts/find_projects.py /usr/src/app/find_projects.py [2020-06-17 10:15:20] --> 10553f4ea3f [2020-06-17 10:15:20] STEP 6: RUN update-ca-certificates [2020-06-17 10:15:23] Updating certificates in /etc/ssl/certs... [2020-06-17 10:15:26] 6 added, 0 removed; done. [2020-06-17 10:15:26] Running hooks in /etc/ca-certificates/update.d... [2020-06-17 10:15:26] [2020-06-17 10:15:27] Adding debian:some_organization_ca.pem.pem [2020-06-17 10:15:27] Adding debian:some_organization_ca_v2.pem.pem [2020-06-17 10:15:27] Adding debian:some_organization_root_ca_v2.pem.pem [2020-06-17 10:15:27] Adding debian:some_organization_server_ca_v2.pem.pem [2020-06-17 10:15:27] Adding debian:some_organization_test_root_ca_v2.pem.pem [2020-06-17 10:15:27] Adding debian:some_organization_test_server_ca_v2.pem.pem [2020-06-17 10:15:27] done. [2020-06-17 10:15:27] done. [2020-06-17 10:17:06] --> 69a465e1d06 [2020-06-17 10:17:06] STEP 7: USER ubuntu [2020-06-17 10:17:06] --> f0a5620c3c7 [2020-06-17 10:17:06] STEP 8: ENV GITLAB_URL=https://gitlab.someorg.fi/ [2020-06-17 10:17:06] --> 9ade877b993 [2020-06-17 10:17:06] STEP 9: ENV ARTIFACTORY_MAVEN_URL=https://dev.someorg.fi/artifactory/virtual-some_organization-master [2020-06-17 10:17:07] --> e2f1a5f12eb [2020-06-17 10:17:07] STEP 10: ENV NODE_OPTIONS=--use-openssl-ca [2020-06-17 10:17:07] STEP 11: COMMIT renovate:some_organization [2020-06-17 10:17:07] --> 0929c927fe7 [2020-06-17 10:17:08] 0929c927fe79dbc644eefa74c52ee5425a2a9aaa04c013e9d497be09839131e6

With buildah, final commit is slow:

[2020-06-17 09:01:53] STEP 1: FROM renovate/renovate [2020-06-17 09:05:42] STEP 2: USER root [2020-06-17 09:05:42] STEP 3: ADD ca-certificates /usr/local/share/ca-certificates [2020-06-17 09:05:47] STEP 4: COPY config.js /usr/src/app/config.js [2020-06-17 09:05:56] STEP 5: COPY scripts/find_projects.py /usr/src/app/find_projects.py [2020-06-17 09:06:00] STEP 6: RUN update-ca-certificates [2020-06-17 09:06:01] Updating certificates in /etc/ssl/certs... [2020-06-17 09:06:06] 6 added, 0 removed; done. [2020-06-17 09:06:06] Running hooks in /etc/ca-certificates/update.d... [2020-06-17 09:06:06] [2020-06-17 09:06:06] Adding debian:some_organization_ca.pem.pem [2020-06-17 09:06:06] Adding debian:some_organization_ca_v2.pem.pem [2020-06-17 09:06:06] Adding debian:some_organization_root_ca_v2.pem.pem [2020-06-17 09:06:06] Adding debian:some_organization_server_ca_v2.pem.pem [2020-06-17 09:06:06] Adding debian:some_organization_test_root_ca_v2.pem.pem [2020-06-17 09:06:06] Adding debian:some_organization_test_server_ca_v2.pem.pem [2020-06-17 09:06:07] done. [2020-06-17 09:06:07] done. [2020-06-17 09:06:08] STEP 7: USER ubuntu [2020-06-17 09:06:08] STEP 8: ENV GITLAB_URL=https://gitlab.someorg.fi/ [2020-06-17 09:06:08] STEP 9: ENV ARTIFACTORY_MAVEN_URL=https://dev.someorg.fi/artifactory/virtual-some_organization-master [2020-06-17 09:06:08] STEP 10: ENV NODE_OPTIONS=--use-openssl-ca [2020-06-17 09:06:08] STEP 11: COMMIT renovate:some_organization [2020-06-17 09:07:46] --> 5af92aee9aa [2020-06-17 09:07:46] 5af92aee9aa1f0fb3fb8e52306da4627d79be28afe08ba2c6a527fd501c10805

Describe the results you expected: Builds are nearly as fast as with root user:

[2020-06-17 09:12:15] STEP 1: FROM renovate/renovate [2020-06-17 09:16:08] STEP 2: USER root [2020-06-17 09:16:08] --> eab068a689f [2020-06-17 09:16:09] STEP 3: ADD ca-certificates /usr/local/share/ca-certificates [2020-06-17 09:16:18] --> 1f80b05c924 [2020-06-17 09:16:18] STEP 4: COPY config.js /usr/src/app/config.js [2020-06-17 09:16:23] --> 46095d73ae2 [2020-06-17 09:16:23] STEP 5: COPY scripts/find_projects.py /usr/src/app/find_projects.py [2020-06-17 09:16:30] --> df4938f0b4c [2020-06-17 09:16:30] STEP 6: RUN update-ca-certificates [2020-06-17 09:16:34] Updating certificates in /etc/ssl/certs... [2020-06-17 09:16:35] 6 added, 0 removed; done. [2020-06-17 09:16:35] Running hooks in /etc/ca-certificates/update.d... [2020-06-17 09:16:35] [2020-06-17 09:16:35] Adding debian:some_organization_ca.pem.pem [2020-06-17 09:16:35] Adding debian:some_organization_ca_v2.pem.pem [2020-06-17 09:16:35] Adding debian:some_organization_root_ca_v2.pem.pem [2020-06-17 09:16:35] Adding debian:some_organization_server_ca_v2.pem.pem [2020-06-17 09:16:35] Adding debian:some_organization_test_root_ca_v2.pem.pem [2020-06-17 09:16:35] Adding debian:some_organization_test_server_ca_v2.pem.pem [2020-06-17 09:16:35] done. [2020-06-17 09:16:35] done. [2020-06-17 09:16:36] --> 1aa98509e44 [2020-06-17 09:16:36] STEP 7: USER ubuntu [2020-06-17 09:16:36] --> 2cc2416f985 [2020-06-17 09:16:36] STEP 8: ENV GITLAB_URL=https://gitlab.someorg.fi/ [2020-06-17 09:16:36] --> c6a8af2700b [2020-06-17 09:16:36] STEP 9: ENV ARTIFACTORY_MAVEN_URL=https://dev.someorg.fi/artifactory/virtual-some_organization-master [2020-06-17 09:16:36] --> cd82fc1b091 [2020-06-17 09:16:36] STEP 10: ENV NODE_OPTIONS=--use-openssl-ca [2020-06-17 09:16:36] STEP 11: COMMIT renovate:some_organization [2020-06-17 09:16:37] --> 359b0e43be6 [2020-06-17 09:16:37] 359b0e43be637faaf074e3660768c65b65056997f2ba44fbe985118585f5bcca

Output of rpm -q buildah or apt list buildah:

Listing... Done buildah/unknown,now 1.14.9~1 amd64 [installed] buildah/unknown 1.14.9~1 arm64 buildah/unknown 1.14.9~1 armhf buildah/unknown 1.14.9~1 s390x

Output of buildah version:

Version: 1.14.9 Go Version: go1.13.8 Image Spec: 1.0.1-dev Runtime Spec: 1.0.1-dev CNI Spec: 0.4.0 libcni Version:
image Version: 5.4.3 Git Commit:
Built: Fri Jun 12 18:50:07 2020 OS/Arch: linux/amd64

Output of podman version if reporting a podman build issue:

Version: 1.9.3 RemoteAPI Version: 1 Go Version: go1.13.8 OS/Arch: linux/amd64

*Output of `cat /etc/release`:**

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04 LTS"
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Output of uname -a:

Linux ubuntu 5.4.0-37-generic #41-Ubuntu SMP Wed Jun 3 18:57:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Output of cat ~/.config/containers/storage.conf:

# Ansible managed
#
[storage]
driver = "overlay"

[storage.options]
# AdditionalImageStores is used to pass paths to additional Read/Only image stores
# Must be comma separated list.
additionalimagestores = [
]

size = ""
override_kernel_check = "false"
mount_program = "/usr/bin/fuse-overlayfs"

Output of cat /etc/containers/storage.conf:

# This file is is the configuration file for all tools
# that use the containers/storage library.
# See man 5 containers-storage.conf for more information
# The "container storage" table contains all of the server options.
[storage]

# Default Storage Driver
driver = ""

# Temporary storage location
runroot = "/var/run/containers/storage"

# Primary Read/Write location of container storage
graphroot = "/var/lib/containers/storage"

# Storage path for rootless users
#
# rootless_storage_path = "$HOME/.local/share/containers/storage"

[storage.options]
# Storage options to be passed to underlying storage drivers

# AdditionalImageStores is used to pass paths to additional Read/Only image stores
# Must be comma separated list.
additionalimagestores = [
]

# Remap-UIDs/GIDs is the mapping from UIDs/GIDs as they should appear inside of
# a container, to the UIDs/GIDs as they should appear outside of the container,
# and the length of the range of UIDs/GIDs.  Additional mapped sets can be
# listed and will be heeded by libraries, but there are limits to the number of
# mappings which the kernel will allow when you later attempt to run a
# container.
#
# remap-uids = 0:1668442479:65536
# remap-gids = 0:1668442479:65536

# Remap-User/Group is a user name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid or /etc/subgid file.  Mappings are set up starting
# with an in-container ID of 0 and then a host-level ID taken from the lowest
# range that matches the specified name, and using the length of that range.
# Additional ranges are then assigned, using the ranges which specify the
# lowest host-level IDs first, to the lowest not-yet-mapped in-container ID,
# until all of the entries have been used for maps.
#
# remap-user = "containers"
# remap-group = "containers"

# Root-auto-userns-user is a user name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid and /etc/subgid file.  These ranges will be partioned
# to containers configured to create automatically a user namespace.  Containers
# configured to automatically create a user namespace can still overlap with containers
# having an explicit mapping set.
# This setting is ignored when running as rootless.
# root-auto-userns-user = "storage"
#
# Auto-userns-min-size is the minimum size for a user namespace created automatically.
# auto-userns-min-size=1024
#
# Auto-userns-max-size is the minimum size for a user namespace created automatically.
# auto-userns-max-size=65536

[storage.options.overlay]
# ignore_chown_errors can be set to allow a non privileged user running with
# a single UID within a user namespace to run containers. The user can pull
# and use any image even those with multiple uids.  Note multiple UIDs will be
# squashed down to the default uid in the container.  These images will have no
# separation between the users in the container. Only supported for the overlay
# and vfs drivers.
#ignore_chown_errors = false

# Path to an helper program to use for mounting the file system instead of mounting it
# directly.
#mount_program = "/usr/bin/fuse-overlayfs"

# mountopt specifies comma separated list of extra mount options
mountopt = "nodev"

# Size is used to set a maximum size of the container image.
# size = ""

[storage.options.thinpool]
# Storage Options for thinpool

# autoextend_percent determines the amount by which pool needs to be
# grown. This is specified in terms of % of pool size. So a value of 20 means
# that when threshold is hit, pool will be grown by 20% of existing
# pool size.
# autoextend_percent = "20"

# autoextend_threshold determines the pool extension threshold in terms
# of percentage of pool size. For example, if threshold is 60, that means when
# pool is 60% full, threshold has been hit.
# autoextend_threshold = "80"

# basesize specifies the size to use when creating the base device, which
# limits the size of images and containers.
# basesize = "10G"

# blocksize specifies a custom blocksize to use for the thin pool.
# blocksize="64k"

# directlvm_device specifies a custom block storage device to use for the
# thin pool. Required if you setup devicemapper.
# directlvm_device = ""

# directlvm_device_force wipes device even if device already has a filesystem.
# directlvm_device_force = "True"

# fs specifies the filesystem type to use for the base device.
# fs="xfs"

# log_level sets the log level of devicemapper.
# 0: LogLevelSuppress 0 (Default)
# 2: LogLevelFatal
# 3: LogLevelErr
# 4: LogLevelWarn
# 5: LogLevelNotice
# 6: LogLevelInfo
# 7: LogLevelDebug
# log_level = "7"

# min_free_space specifies the min free space percent in a thin pool require for
# new device creation to succeed. Valid values are from 0% - 99%.
# Value 0% disables
# min_free_space = "10%"

# mkfsarg specifies extra mkfs arguments to be used when creating the base
# device.
# mkfsarg = ""

# Size is used to set a maximum size of the container image.
# size = ""

# use_deferred_removal marks devicemapper block device for deferred removal.
# If the thinpool is in use when the driver attempts to remove it, the driver
# tells the kernel to remove it as soon as possible. Note this does not free
# up the disk space, use deferred deletion to fully remove the thinpool.
# use_deferred_removal = "True"

# use_deferred_deletion marks thinpool device for deferred deletion.
# If the device is busy when the driver attempts to delete it, the driver
# will attempt to delete device every 30 seconds until successful.
# If the program using the driver exits, the driver will continue attempting
# to cleanup the next time the driver is used. Deferred deletion permanently
# deletes the device and all data stored in device will be lost.
# use_deferred_deletion = "True"

# xfs_nospace_max_retries specifies the maximum number of retries XFS should
# attempt to complete IO when ENOSPC (no space) error is returned by
# underlying storage device.
# xfs_nospace_max_retries = "0"

Podman info:

host:
  arch: amd64
  buildahVersion: 1.14.9
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.16, commit: '
  cpus: 4
  distribution:
    distribution: ubuntu
    version: "20.04"
  eventLogger: file
  hostname: ubuntu
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 5.4.0-37-generic
  memFree: 2680352768
  memTotal: 8348520448
  ociRuntime:
    name: runc
    package: 'containerd.io: /usr/bin/runc'
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10
      commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
      spec: 1.0.1-dev
  os: linux
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 1.0.0
      commit: unknown
      libslirp: 4.2.0
  swapFree: 1963859968
  swapTotal: 1964396544
  uptime: 13h 13m 19.62s (Approximately 0.54 days)
registries:
  search:
  - registry.access.redhat.com
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.centos.org
store:
  configFile: /home/user/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: 'fuse-overlayfs: /usr/bin/fuse-overlayfs'
      Version: |-
        fusermount3 version: 3.9.0
        fuse-overlayfs: version 0.7.6
        FUSE library version 3.9.0
        using FUSE kernel interface version 7.31
  graphRoot: /home/user/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 10
  runRoot: /run/user/1000/containers
  volumePath: /home/user/.local/share/containers/storage/volumes

Example of output layers: Podman produces following layers:

ID CREATED CREATED BY SIZE COMMENT 69a465e1d065 10 minutes ago /bin/sh -c #(nop) ENV NODE_OPTIONS=--use-o... 0B

10 minutes ago /bin/sh -c #(nop) ENV ARTIFACTORY_MAVEN_UR... 0B 10 minutes ago /bin/sh -c #(nop) ENV GITLAB_URL=https://g... 0B 10 minutes ago /bin/sh -c #(nop) USER ubuntu 0B 12 minutes ago /bin/sh -c update-ca-certificates 394.8kB 10553f4ea3f5 14 minutes ago /bin/sh -c #(nop) COPY file:0d3f6d9b493a1c... 4.608kB b302c52878ee 15 minutes ago /bin/sh -c #(nop) COPY file:6baac5b68422ff... 4.096kB a9c1775048d9 17 minutes ago /bin/sh -c #(nop) ADD dir:c05c83007de5c732... 19.46kB ea0b56383d8f 17 minutes ago /bin/sh -c #(nop) USER root 0B Buildah produces following layers: af92aee9aa1 4 minutes ago /bin/sh -c #(nop) ENV NODE_OPTIONS=--use-o... 417.8kB ea0b56383d8f 4 minutes ago /bin/sh -c #(nop) ENV ARTIFACTORY_MAVEN_UR... 0B 4 minutes ago /bin/sh -c #(nop) ENV GITLAB_URL=https://g... 0B 4 minutes ago /bin/sh -c #(nop) USER ubuntu 0B 4 minutes ago /bin/sh -c update-ca-certificates 0B 4 minutes ago /bin/sh -c #(nop) COPY file:0d3f6d9b493a1c... 0B 4 minutes ago /bin/sh -c #(nop) COPY file:6baac5b68422ff... 0B 5 minutes ago /bin/sh -c #(nop) ADD dir:c05c83007de5c732... 0B 5 minutes ago /bin/sh -c #(nop) USER root 0B
TomSweeneyRedHat commented 4 years ago

@nalind, throwing this one your way too as you're working on the speed issues.

skorhone commented 4 years ago

Podman update to 2.0.0 did not improve situation. Current podman info:

host:
  arch: amd64
  buildahVersion: 1.15.0
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.18, commit: '
  cpus: 4
  distribution:
    distribution: ubuntu
    version: "20.04"
  eventLogger: file
  hostname: ubuntu
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 5.4.0-37-generic
  linkmode: dynamic
  memFree: 124592128
  memTotal: 8348520448
  ociRuntime:
    name: runc
    package: 'containerd.io: /usr/bin/runc'
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10
      commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
      spec: 1.0.1-dev
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 1.0.0
      commit: unknown
      libslirp: 4.2.0
  swapFree: 1949442048
  swapTotal: 1964396544
  uptime: 25h 55m 2.25s (Approximately 1.04 days)
registries:
  search:
  - registry.access.redhat.com
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.centos.org
store:
  configFile: /home/k847259/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: 'fuse-overlayfs: /usr/bin/fuse-overlayfs'
      Version: |-
        fusermount3 version: 3.9.0
        fuse-overlayfs: version 0.7.6
        FUSE library version 3.9.0
        using FUSE kernel interface version 7.31
  graphRoot: /home/k847259/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 5
  runRoot: /run/user/1000/containers
  volumePath: /home/k847259/.local/share/containers/storage/volumes
version:
  APIVersion: 1
  Built: 0
  BuiltTime: Thu Jan  1 02:00:00 1970
  GitCommit: ""
  GoVersion: go1.13.8
  OsArch: linux/amd64
  Version: 2.0.0
rhatdan commented 4 years ago

@giuseppe Any chance this is fuse-overlay related?

skorhone commented 4 years ago

I've also tried squashing the base image (which has ridiculous number of layers), but it made no difference.

According iostat to there's very little disk io during build. However both podman/buildah and fuse-overlayfs are both consuming heaps of CPU. I'd expect to see results like this, if buildah has to iterate over all files to find what's changed, and if fuse-overlayfs is caching file metadata in memory and if metadata cache implementation is using linear time (or worse) search algorithm.

iostat sample:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8,56    0,00    8,29    0,00    0,00   83,15

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
loop0             0,00         0,00         0,00         0,00          0          0          0
loop1             0,00         0,00         0,00         0,00          0          0          0
loop10            0,00         0,00         0,00         0,00          0          0          0
loop11            0,00         0,00         0,00         0,00          0          0          0
loop2             0,00         0,00         0,00         0,00          0          0          0
loop3             0,00         0,00         0,00         0,00          0          0          0
loop4             0,00         0,00         0,00         0,00          0          0          0
loop5             0,00         0,00         0,00         0,00          0          0          0
loop6             0,00         0,00         0,00         0,00          0          0          0
loop7             0,00         0,00         0,00         0,00          0          0          0
loop8             0,00         0,00         0,00         0,00          0          0          0
loop9             0,00         0,00         0,00         0,00          0          0          0
sda               0,00         0,00         0,00         0,00          0          0          0
giuseppe commented 4 years ago

System shows heavy CPU usage on both podman and fuse-overlayfs processes

This is interesting.

Could you please strace podman and fuse-overlayfs when they show heavy CPU usage? I'd not expect any CPU usage on podman.

@rhatdan it seems COPY is taking most of the time. Probably something that will be improved with the @nalind's work

skorhone commented 4 years ago

Here's a random strace during copy phase of files. I think I hit strace somewhere between following steps:

STEP 3: COPY config.js /usr/src/app/config.js STEP 4: COPY scripts/find_projects.py /usr/src/app/find_projects.py

podman.strace.gz

rhatdan commented 4 years ago

Do you have a .dockerignore file?

skorhone commented 4 years ago

Our repository doesn't have it (none in build context), but it seems that FROM image (Renovate) is using .dockerignore.

I don't think that use of .dockerignore in base image could cause such side effects though.

giuseppe commented 4 years ago

I had a look at it.

The base image has a lot of files:

$ find /usr -type f | wc -l
89325

and with fuse-overlayfs we use the NaiveDiff backend for each layer. So it means we need to stat, and read xattrs from each of these files.

I guess you'll see better results if you run as root and we can use the optimized diff backend.

We could teach the diff driver how to deal with the upper layer created by fuse-overlayfs but I'd prefer if we don't go this way as the storage driver now has no knowledge about fuse-overlayfs at all, except how to create the mount when the mount program is specified

skorhone commented 4 years ago

In a way I agree that high coupling to fuse is undesirable.

On the other hand, I think that performance gains for rootless builds would be significant, if you checked names of changed files from the upper layer rather than iterating over whole filesystem. Using inotify to monitor changes would probably be even faster (and compatible with non most filesystems), if it works with fuse

I do understand that this might not a high priority item at this time. If not fixed, from user perspective this kind of behavioral difference should be well documented. A warning that points to documentation during a build might not be a bad idea

Should I changetitle of this issue to better describe the underlying issue? Or should we close this one and create a new one?

Romain-Geissler-1A commented 3 years ago

Hi,

FYI, trying to migrate some of the builds of my company to RHEL 8, we also naturally move from the historical Docker container engine to podman. I also note that it seems that for us, some of our workloads are quite impacted by the use of fuse-overlay (well, in my case it's when using podman, not buildah, but I guess the problems are shared by both). Typically, actions either involving lots of files (like for example compiling the C++ Boost library has a step where all include header files are copied to install directory, and these step takes minutes (I also think the boost build system is intrinsicly slow). Just in case of lots of small files copy, I often see fuse-overlay consuming between 60 and 100% of CPU in top.

I also noticed that many of my process are randomly stuck in the D state when running inside podman. Sometimes, after a bit of time, it automagically restarts again. Sometimes, if in the same container I start to do some file related operations, it seems to "unlock" the other stalled D process. In other occasion, i just stopped the container which was "stuck", restarted it, and then it worked without problem.

Finally, I do compile a lot of things with g++ in my containers. I noticed that when compiling JsonSpirit, apparently there are some .cpp files that generate huge assembly files (bigger than usual). It happens that until now, I did not compile them with "-pipe" gcc flag allowing to pipe the assembly file between gcc and gas. The gcc process was again very slow, taking so long that in the end I decided to kill them (after several minutes). They consumed not CPU, but were always stuck in D state. Stracing them showed that they issued only write(8196) syscalls. When I added the "-pipe" gcc options, compiling these files was down to a reasonable time.

It looks like IO performances in podman/buildah found on RHEL 8 aren't always matching the ones of Docker on other OS. I also suspect that fuse-overlay is being the problem here. As I could read it here, as soon as you use fuse, you can't expect all file operations to be quick. So, is there some long term plans which would consist in not using fuse-overlay anymore for rootless containers by default ? It looks like it used to be vfs, and was switched to fuse (apparently for performances reasons too ?), is there any other file driver that would allow rootless containers with native filesystem performance speed ?

Cheers, Romain

rhatdan commented 3 years ago

@giuseppe PTAL One potential speed up would be to use a volume as opposed to a COPY or ADD Operation. BTW There has been a big improvement in COPY/ADD in the upstream podman/buildah packages. They should be arriving in Podman 2.0.6 and later versions.

giuseppe commented 3 years ago

fuse-overlayfs is slow when there are multiple writes at the same time, one possible speed up is to disable sync calls with mountopt = "fsync=0 in your storage.conf file.

Could it be possible to use a volume for the build and then only copy the files to destination?

Something like: podman run -v ./tmp:/workspace ... and do all the build in the /workspace directory then just install them into /. Such trick speeds up also native overlay

alxchk commented 3 years ago

I'm facing same problems, and the only way is to use podman/buildah with fuse-overlayfs is with --squash. When I traced fuse-overlayfs, it was busy of propagating stat() to each layer. In case there are several (or in my case it was about 30), it's almost impossible to use it.

rhatdan commented 3 years ago

@giuseppe WDYT?

giuseppe commented 3 years ago

I'm facing same problems, and the only way is to use podman/buildah with fuse-overlayfs is with --squash. When I traced fuse-overlayfs, it was busy of propagating stat() to each layer. In case there are several (or in my case it was about 30), it's almost impossible to use it.

this looks like a different issue. fuse-overlayfs will always be slower in the lookups than native overlay, but I can debug if there is any issue and we are doing more lookups than needed. Do you have a reproducer I can use?

alxchk commented 3 years ago

I may try to prepare, but as previous time this is setup of wine.. I have old trace record. It's just part, but it's looks like obvious what is going on. Application frequently checks existence of some file, which does not exists. The check propagated to each layer. timeout-60s-strace.txt

jontrossbach commented 3 years ago

I can start a new issue if this isn't pertinent enough here but -- on a personal project I've been working on -- I tried Podman/Builda as a drop in replacement for a docker build I did on Ubuntu 20.10 and even with fuse-overlayfs the build takes up all my remaining disk space until the system cannot handle it anymore with Podman telling me that there is "no space left on device". Perhaps if I had more disk space left (~20 Gb) the build would succeed but I found it might be instructive to leave the issue here given that Docker did the build fine with less resources.

It appears this is happening within a single layer with a rather large build taking place within it. I uploaded the set up I am using here.

rhatdan commented 3 years ago

The slowness of fuse-overlay when under load is not something we can fix, but the upstream kernel-5.11 now supports native overlay mounts, we are going to move to this as default once it becomes widely available. And fuse-overlay will move to more of a niche player, where it is needed, kernels where it is not supported as well as perhaps NFS homedirs and a few other cases, where we want to customize the way the file system work.

The other issues being brought up here about buildah use of space should be in a separate issue, although there already is one complaining about it wasting space.

kalvdans commented 3 years ago

the upstream kernel-5.11 now supports native overlay mounts, we are going to move to this as default once it becomes widely available.

Will the kernel allow regular users to do native overlay mounts?

rhatdan commented 3 years ago

Yes, within a usernamespace/mount namespace. Same as they can with tmpfs, binds, sysfs ...

We have an issue with the kernel and SELinux support that may delay it for a couple of months, but we are trying to work through it.

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

mikepurvis commented 3 years ago

It might be helpful if the Buildah team were able to provide guidance on how to best run it under various scenarios/environments, and what the tradeoffs are. Based on Best Practices For Running Buildah In a Container (2019), I was looking at just mounting /dev/fuse into my microk8s builders and going that route, but following on the commentary above, it seems like it may be more sensible to just use privileged containers in the short term with an expectation of dropping the privs later this year when 5.11 (not yet in Ubuntu 21.04, but it is expected and the freeze is still a few weeks away) lands and Buildah is able to take advantage of it.

rhatdan commented 3 years ago

Sure.

rhatdan commented 3 years ago

@mikepurvis What problem are you worried about specifically above?

rhatdan commented 3 years ago

Since I received no input , I am closing.

mikepurvis commented 3 years ago

@rhatdan Ah boo, sorry I missed that this ticket was blocked on a question for me. Basically my situation is that I've had bad experiences in the past (eg, vanilla Docker) with stuff crashing in privileged containers and leaving loops and so-on hanging around and eventually hosing the node. I'm not like super worried about buildah in this regard as I expect it will be better behaved than my random test harnesses were that caused those problems in the past.

At the same time, part of the promise of buildah is daemonless, rootless, privilegeless container building— it seems like this isn't quite the reality today, so the guidance I'm looking for specifically is:

a) What is the roadmap to building images with buildah in a non-fancy Kubernetes container, and b) What do I do today (in terms of workarounds, tricks, whatever) to best prepare me for whatever that solution eventually looks like?

rhatdan commented 3 years ago

We are preparing a blog on running rootless Podman in different environments that should also work for Buildah. Expect to see it within the next couple of weeks.

We are exploring Podman in Podman, in Docker, in Kubernetes and the different ways users might want to run it.

We are making improvements to Podman in the 3.2 release to make it easier to consume inside of a container, and this release is what is holding up the release of the blog.

Once the Blog is released, we plan on making it a living document in podman.io that can be updated with new ways people are running containers within containers.

mikepurvis commented 3 years ago

Awesome, I look forward to it!

kalvdans commented 2 years ago

Blog post is https://www.redhat.com/sysadmin/podman-inside-container in case someone reads this issue :)