containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.33k stars 2.38k forks source link

Podman is unable to write to AWS EFS (NFS) when used as additional file storage #22554

Open ankurmalhotra07 opened 5 months ago

ankurmalhotra07 commented 5 months ago

Issue Description

Want to use additional image stores as explained in this guide However, running into issues with both reading from and writing to the EFS share. Note issue only occurs with certain images. Also, for images that do work, the write performance is very slow.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Provision EFS share in AWS
  2. Mount EFS share mount -t efs -o tls fs-123...:/ /var/lib/mycontainers
  3. Pull image using podman time podman --root /var/lib/mycontainers pull docker.io/amazoncorretto:latest

Describe the results you received

+ podman --root /var/lib/mycontainers --log-level=debug pull docker.io/amazoncorretto:latest

time="2024-04-30T20:28:04Z" level=info msg="podman filtering at log level debug"

time="2024-04-30T20:28:04Z" level=debug msg="Called pull.PersistentPreRunE(podman --root /var/lib/mycontainers --log-level=debug pull docker.io/amazoncorretto:latest)"

time="2024-04-30T20:28:04Z" level=debug msg="Using conmon: \"/usr/bin/conmon\""

time="2024-04-30T20:28:04Z" level=info msg="Using sqlite as database backend"

time="2024-04-30T20:28:05Z" level=debug msg="Using graph driver overlay"

time="2024-04-30T20:28:05Z" level=debug msg="Using graph root /var/lib/mycontainers"

time="2024-04-30T20:28:05Z" level=debug msg="Using run root /var/run/containers/storage"

time="2024-04-30T20:28:05Z" level=debug msg="Using static dir /var/lib/mycontainers/libpod"

time="2024-04-30T20:28:05Z" level=debug msg="Using tmp dir /run/libpod"

time="2024-04-30T20:28:05Z" level=debug msg="Using volume path /var/lib/mycontainers/volumes"

time="2024-04-30T20:28:05Z" level=debug msg="Using transient store: false"

time="2024-04-30T20:28:05Z" level=debug msg="[graphdriver] trying provided driver \"overlay\""

time="2024-04-30T20:28:05Z" level=debug msg="Unable to create kernel-style whiteout: errno 524"

time="2024-04-30T20:28:05Z" level=debug msg="backingFs=nfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=false"

time="2024-04-30T20:28:05Z" level=debug msg="Initializing event backend file"

time="2024-04-30T20:28:05Z" level=debug msg="Configured OCI runtime youki initialization failed: no valid executable found for OCI runtime youki: invalid argument"

time="2024-04-30T20:28:05Z" level=debug msg="Configured OCI runtime ocijail initialization failed: no valid executable found for OCI runtime ocijail: invalid argument"

time="2024-04-30T20:28:05Z" level=debug msg="Configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument"

time="2024-04-30T20:28:05Z" level=debug msg="Configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument"

time="2024-04-30T20:28:05Z" level=debug msg="Configured OCI runtime runc initialization failed: no valid executable found for OCI runtime runc: invalid argument"

time="2024-04-30T20:28:05Z" level=debug msg="Configured OCI runtime runj initialization failed: no valid executable found for OCI runtime runj: invalid argument"

time="2024-04-30T20:28:05Z" level=debug msg="Configured OCI runtime krun initialization failed: no valid executable found for OCI runtime krun: invalid argument"

time="2024-04-30T20:28:05Z" level=debug msg="Configured OCI runtime crun-wasm initialization failed: no valid executable found for OCI runtime crun-wasm: invalid argument"

time="2024-04-30T20:28:05Z" level=debug msg="Using OCI runtime \"/usr/bin/crun\""

time="2024-04-30T20:28:05Z" level=debug msg="Successfully loaded 1 networks"

time="2024-04-30T20:28:05Z" level=debug msg="Initialized SHM lock manager at path /libpod_lock"

time="2024-04-30T20:28:05Z" level=debug msg="Podman detected system restart - performing state refresh"

time="2024-04-30T20:28:05Z" level=info msg="Setting parallel job count to 25"

time="2024-04-30T20:28:05Z" level=debug msg="Pulling image docker.io/amazoncorretto:latest (policy: always)"

time="2024-04-30T20:28:05Z" level=debug msg="Looking up image \"docker.io/amazoncorretto:latest\" in local containers storage"

time="2024-04-30T20:28:05Z" level=debug msg="Normalized platform linux/amd64 to {amd64 linux  [] }"

time="2024-04-30T20:28:05Z" level=debug msg="Trying \"docker.io/library/amazoncorretto:latest\" ..."

time="2024-04-30T20:28:05Z" level=debug msg="reference \"[overlay@/var/lib/mycontainers+/var/run/containers/storage]docker.io/library/amazoncorretto:latest\" does not resolve to an image ID"

time="2024-04-30T20:28:05Z" level=debug msg="Trying \"docker.io/library/amazoncorretto:latest\" ..."

time="2024-04-30T20:28:05Z" level=debug msg="reference \"[overlay@/var/lib/mycontainers+/var/run/containers/storage]docker.io/library/amazoncorretto:latest\" does not resolve to an image ID"

time="2024-04-30T20:28:05Z" level=debug msg="Trying \"docker.io/amazoncorretto:latest\" ..."

time="2024-04-30T20:28:05Z" level=debug msg="Loading registries configuration \"/etc/containers/registries.conf\""

time="2024-04-30T20:28:05Z" level=debug msg="Loading registries configuration \"/etc/containers/registries.conf.d/000-shortnames.conf\""

time="2024-04-30T20:28:05Z" level=debug msg="Normalized platform linux/amd64 to {amd64 linux  [] }"

time="2024-04-30T20:28:05Z" level=debug msg="Attempting to pull candidate docker.io/library/amazoncorretto:latest for docker.io/amazoncorretto:latest"

time="2024-04-30T20:28:05Z" level=debug msg="parsed reference into \"[overlay@/var/lib/mycontainers+/var/run/containers/storage]docker.io/library/amazoncorretto:latest\""

Trying to pull docker.io/library/amazoncorretto:latest...

time="2024-04-30T20:28:05Z" level=debug msg="Copying source image //amazoncorretto:latest to destination image [overlay@/var/lib/mycontainers+/var/run/containers/storage]docker.io/library/amazoncorretto:latest"

time="2024-04-30T20:28:05Z" level=debug msg="Using registries.d directory /etc/containers/registries.d"

time="2024-04-30T20:28:05Z" level=debug msg="Trying to access \"docker.io/library/amazoncorretto:latest\""

time="2024-04-30T20:28:05Z" level=debug msg="No credentials matching docker.io/library/amazoncorretto found in /run/containers/0/auth.json"

time="2024-04-30T20:28:05Z" level=debug msg="No credentials matching docker.io/library/amazoncorretto found in /root/.config/containers/auth.json"

time="2024-04-30T20:28:05Z" level=debug msg="No credentials matching docker.io/library/amazoncorretto found in /root/.docker/config.json"

time="2024-04-30T20:28:05Z" level=debug msg="No credentials matching docker.io/library/amazoncorretto found in /root/.dockercfg"

time="2024-04-30T20:28:05Z" level=debug msg="No credentials for docker.io/library/amazoncorretto found"

time="2024-04-30T20:28:05Z" level=debug msg=" No signature storage configuration found for docker.io/library/amazoncorretto:latest, using built-in default file:///var/lib/containers/sigstore"

time="2024-04-30T20:28:05Z" level=debug msg="Looking for TLS certificates and private keys in /etc/docker/certs.d/docker.io"

time="2024-04-30T20:28:05Z" level=debug msg="GET https://registry-1.docker.io/v2/"

time="2024-04-30T20:28:06Z" level=debug msg="Ping https://registry-1.docker.io/v2/ status 401"

time="2024-04-30T20:28:06Z" level=debug msg="GET https://auth.docker.io/token?scope=repository%3Alibrary%2Famazoncorretto%3Apull&service=registry.docker.io"

time="2024-04-30T20:28:06Z" level=debug msg="GET https://registry-1.docker.io/v2/library/amazoncorretto/manifests/latest"

time="2024-04-30T20:28:06Z" level=debug msg="Content-Type from manifest GET is \"application/vnd.docker.distribution.manifest.list.v2+json\""

time="2024-04-30T20:28:06Z" level=debug msg="Using SQLite blob info cache at /var/lib/containers/cache/blob-info-cache-v1.sqlite"

time="2024-04-30T20:28:06Z" level=debug msg="Source is a manifest list; copying (only) instance sha256:1bc77946efe2f076c03f03ea8485a97209162c28dbbbb09801476aaac9a814bf for current system"

time="2024-04-30T20:28:06Z" level=debug msg="GET https://registry-1.docker.io/v2/library/amazoncorretto/manifests/sha256:1bc77946efe2f076c03f03ea8485a97209162c28dbbbb09801476aaac9a814bf"

time="2024-04-30T20:28:07Z" level=debug msg="Content-Type from manifest GET is \"application/vnd.docker.distribution.manifest.v2+json\""

time="2024-04-30T20:28:07Z" level=debug msg="IsRunningImageAllowed for image docker:docker.io/library/amazoncorretto:latest"

time="2024-04-30T20:28:07Z" level=debug msg=" Using default policy section"

time="2024-04-30T20:28:07Z" level=debug msg=" Requirement 0: allowed"

time="2024-04-30T20:28:07Z" level=debug msg="Overall: allowed"

time="2024-04-30T20:28:07Z" level=debug msg="Downloading /v2/library/amazoncorretto/blobs/sha256:d55c22491690f5697cf4f6e4669812b59124f5eae1af98fac63da9d1163a8a9c"

time="2024-04-30T20:28:07Z" level=debug msg="GET https://registry-1.docker.io/v2/library/amazoncorretto/blobs/sha256:d55c22491690f5697cf4f6e4669812b59124f5eae1af98fac63da9d1163a8a9c"

Getting image source signatures

time="2024-04-30T20:28:07Z" level=debug msg="Reading /var/lib/containers/sigstore/library/amazoncorretto@sha256=1bc77946efe2f076c03f03ea8485a97209162c28dbbbb09801476aaac9a814bf/signature-1"

time="2024-04-30T20:28:07Z" level=debug msg="Not looking for sigstore attachments: disabled by configuration"

time="2024-04-30T20:28:07Z" level=debug msg="Manifest has MIME type application/vnd.docker.distribution.manifest.v2+json, ordered candidate list [application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.v1+prettyjws, application/vnd.oci.image.manifest.v1+json, application/vnd.docker.distribution.manifest.v1+json]"

time="2024-04-30T20:28:07Z" level=debug msg="... will first try using the original manifest unmodified"

Copying blob sha256:43078548e45db87fd06873ff77e9e5767914375989d5a3bd76e8214835a3d7d0

Copying blob sha256:0b2952a75473f303233bc1034d63689122b90aa8b8fd5ebd0dced887e1c294e9

time="2024-04-30T20:28:07Z" level=debug msg="Checking if we can reuse blob sha256:43078548e45db87fd06873ff77e9e5767914375989d5a3bd76e8214835a3d7d0: general substitution = true, compression for MIME type \"application/vnd.docker.image.rootfs.diff.tar.gzip\" = true"

time="2024-04-30T20:28:07Z" level=debug msg="Checking if we can reuse blob sha256:0b2952a75473f303233bc1034d63689122b90aa8b8fd5ebd0dced887e1c294e9: general substitution = true, compression for MIME type \"application/vnd.docker.image.rootfs.diff.tar.gzip\" = true"

time="2024-04-30T20:28:07Z" level=debug msg="Failed to retrieve partial blob: convert_images not configured"

time="2024-04-30T20:28:07Z" level=debug msg="Downloading /v2/library/amazoncorretto/blobs/sha256:43078548e45db87fd06873ff77e9e5767914375989d5a3bd76e8214835a3d7d0"

time="2024-04-30T20:28:07Z" level=debug msg="GET https://registry-1.docker.io/v2/library/amazoncorretto/blobs/sha256:43078548e45db87fd06873ff77e9e5767914375989d5a3bd76e8214835a3d7d0"

time="2024-04-30T20:28:07Z" level=debug msg="Failed to retrieve partial blob: convert_images not configured"

time="2024-04-30T20:28:07Z" level=debug msg="Downloading /v2/library/amazoncorretto/blobs/sha256:0b2952a75473f303233bc1034d63689122b90aa8b8fd5ebd0dced887e1c294e9"

time="2024-04-30T20:28:07Z" level=debug msg="GET https://registry-1.docker.io/v2/library/amazoncorretto/blobs/sha256:0b2952a75473f303233bc1034d63689122b90aa8b8fd5ebd0dced887e1c294e9"

time="2024-04-30T20:28:07Z" level=debug msg="Detected compression format gzip"

time="2024-04-30T20:28:07Z" level=debug msg="Using original blob without modification"

time="2024-04-30T20:28:07Z" level=debug msg="Detected compression format gzip"

time="2024-04-30T20:28:07Z" level=debug msg="Using original blob without modification"

time="2024-04-30T20:28:08Z" level=debug msg="Applying tar in /var/lib/mycontainers/overlay/50398924c43a1129031d251a84eea6a278d1f682e8e9c27ab4e73e34e9f42acd/diff"

time="2024-04-30T20:28:09Z" level=debug msg="Error pulling candidate docker.io/library/amazoncorretto:latest: copying system image from manifest list: writing blob: adding layer with blob \"sha256:0b2952a75473f303233bc1034d63689122b90aa8b8fd5ebd0dced887e1c294e9\": processing tar file(errno 524): exit status 1"

Error: copying system image from manifest list: writing blob: adding layer with blob "sha256:0b2952a75473f303233bc1034d63689122b90aa8b8fd5ebd0dced887e1c294e9": processing tar file(errno 524): exit status 1

time="2024-04-30T20:28:09Z" level=debug msg="Shutting down engines"

Describe the results you expected

podman should successfully pull the image to /var/lib/mycontainers

podman info output

+ podman info

host:

  arch: amd64

  buildahVersion: 1.33.3

  cgroupControllers:

  - cpuset

  - cpu

  - cpuacct

  - blkio

  - memory

  - devices

  - freezer

  - net_cls

  - perf_event

  - net_prio

  - hugetlb

  - pids

  cgroupManager: cgroupfs

  cgroupVersion: v1

  conmon:

    package: conmon-2.1.10-1.fc39.x86_64

    path: /usr/bin/conmon

    version: 'conmon version 2.1.10, commit: '

  cpuUtilization:

    idlePercent: 91.67

    systemPercent: 1.63

    userPercent: 6.7

  cpus: 8

  databaseBackend: sqlite

  distribution:

    distribution: fedora

    variant: container

    version: "39"

  eventLogger: file

  freeLocks: 2048

  hostname: 

  idMappings:

    gidmap: null

    uidmap: null

  kernel: 5.10.209-198.858.amzn2.x86_64

  linkmode: dynamic

  logDriver: k8s-file

  memFree: 18091565056

  memTotal: 65994149888

  networkBackend: netavark

  networkBackendInfo:

    backend: netavark

    dns:

      package: aardvark-dns-1.10.0-1.fc39.x86_64

      path: /usr/libexec/podman/aardvark-dns

      version: aardvark-dns 1.10.0

    package: netavark-1.10.3-1.fc39.x86_64

    path: /usr/libexec/podman/netavark

    version: netavark 1.10.3

  ociRuntime:

    name: crun

    package: crun-1.14.3-1.fc39.x86_64

    path: /usr/bin/crun

    version: |-

      crun version 1.14.3

      commit: 1961d211ba98f532ea52d2e80f4c20359f241a98

      rundir: /run/crun

      spec: 1.0.0

      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL

  os: linux

  pasta:

    executable: /usr/bin/pasta

    package: passt-0^20231230.gf091893-1.fc39.x86_64

    version: |

      pasta 0^20231230.gf091893-1.fc39.x86_64

      Copyright Red Hat

      GNU General Public License, version 2 or later

        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>

      This is free software: you are free to change and redistribute it.

      There is NO WARRANTY, to the extent permitted by law.

  remoteSocket:

    exists: false

    path: /run/podman/podman.sock

  security:

    apparmorEnabled: false

    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT

    rootless: false

    seccompEnabled: true

    seccompProfilePath: /usr/share/containers/seccomp.json

    selinuxEnabled: false

  serviceIsRemote: false

  slirp4netns:

    executable: /usr/bin/slirp4netns

    package: slirp4netns-1.2.2-1.fc39.x86_64

    version: |-

      slirp4netns version 1.2.2

      commit: 0ee2d87523e906518d34a6b423271e4826f71faf

      libslirp: 4.7.0

      SLIRP_CONFIG_VERSION_MAX: 4

      libseccomp: 2.5.3

  swapFree: 0

  swapTotal: 0

  uptime: 27h 29m 35.00s (Approximately 1.12 days)

  variant: ""

plugins:

  authorization: null

  log:

  - k8s-file

  - none

  - passthrough

  - journald

  network:

  - bridge

  - macvlan

  - ipvlan

  volume:

  - local

registries:

  search:

  - registry.fedoraproject.org

  - registry.access.redhat.com

  - docker.io

  - quay.io

store:

  configFile: /etc/containers/storage.conf

  containerStore:

    number: 0

    paused: 0

    running: 0

    stopped: 0

  graphDriverName: overlay

  graphOptions:

    overlay.ignore_chown_errors: "true"

    overlay.imagestore: /var/lib/mycontainers

    overlay.mount_program:

      Executable: /usr/bin/fuse-overlayfs

      Package: fuse-overlayfs-1.12-2.fc39.x86_64

      Version: |-

        fusermount3 version: 3.16.1

        fuse-overlayfs: version 1.12

        FUSE library version 3.16.1

        using FUSE kernel interface version 7.38

    overlay.mountopt: nodev,fsync=0

  graphRoot: /var/lib/containers/storage

  graphRootAllocated: 549743210496

  graphRootUsed: 60121776128

  graphStatus:

    Backing Filesystem: xfs

    Native Overlay Diff: "false"

    Supports d_type: "true"

    Supports shifting: "true"

    Supports volatile: "true"

    Using metacopy: "false"

  imageCopyTmpDir: /var/tmp

  imageStore:

    number: 0

  runRoot: /var/run/containers/storage

  transientStore: false

  volumePath: /var/lib/containers/storage/volumes

version:

  APIVersion: 4.9.0

  Built: 1706090847

  BuiltTime: Wed Jan 24 10:07:27 2024

  GitCommit: ""

  GoVersion: go1.21.6

  Os: linux

  OsArch: linux/amd64

  Version: 4.9.0

Podman in a container

Yes

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

Additional environment details

[storage.options] additionalimagestores = ["/var/lib/mycontainers"]

[storage.options.overlay] ignore_chown_errors = "true" mount_program = "/usr/bin/fuse-overlayfs" mountopt = "nodev,fsync=0"`



### Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting
ankurmalhotra07 commented 5 months ago

@giuseppe do you want the strace for this as well?

giuseppe commented 5 months ago

I think the root cause is the same, and the network file system doesn't work well with local capabilities.

Please do a test like:

# mount -t efs -o tls fs-123...:/ /var/lib/mycontainers
# mkdir /var/lib/mycontainers/foo
# chmod 000 /var/lib/mycontainers/foo
# touch  /var/lib/mycontainers/foo/bar

What do you get?

github-actions[bot] commented 4 months ago

A friendly reminder that this issue had no activity for 30 days.