containers / storage

Container Storage Library
Apache License 2.0
542 stars 235 forks source link

zfs driver: dataset is busy causes orphaned layers #2005

Open JakeCooper opened 2 weeks ago

JakeCooper commented 2 weeks ago

Issue Description

While running with the ZFS driver, maybe one out of every 50 containers ends up with the following error

cleaning up storage: removing container 8941b7366dfbe5deb66554f86be2f1c931e510e1d770ed00bdd49c56aaa46c18 root filesystem: 1 error occurred:
    * deleting layer "7d18f701724d11385711e8e452835438c4c6f8b23c4b91ace208ef51e88005bb": exit status 1: "/usr/sbin/zfs destroy -r podman/7d18f701724d11385711e8e452835438c4c6f8b23c4b91ace208ef51e88005bb" => cannot destroy 'podman/7d18f701724d11385711e8e452835438c4c6f8b23c4b91ace208ef51e88005bb': dataset is busy

Steps to reproduce the issue

Steps to reproduce the issue

  1. Use the ZFS native driver
  2. Deploy maybe 50-100 different containers
  3. You'll get this error

Describe the results you received

Error above

Describe the results you expected

No error

podman info output

host:
  arch: amd64
  buildahVersion: 1.36.0
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: Unknown
    path: /usr/local/bin/conmon
    version: 'conmon version 2.1.12, commit: e8896631295ccb0bfdda4284f1751be19b483264-dirty'
  cpuUtilization:
    idlePercent: 66.22
    systemPercent: 11.5
    userPercent: 22.28
  cpus: 32
  databaseBackend: sqlite
  distribution:
    codename: bookworm
    distribution: debian
    version: "12"
  eventLogger: journald
  freeLocks: 65277
  hostname: production-stacker-178
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.1.0-13-cloud-amd64
  linkmode: dynamic
  logDriver: journald
  memFree: 24015925248
  memTotal: 270471868416
  networkBackend: cni
  networkBackendInfo:
    backend: cni
    dns: {}
  ociRuntime:
    name: crun
    package: Unknown
    path: /usr/local/bin/crun
    version: |-
      crun version 1.15
      commit: e6eacaf4034e84185fd8780ac9262bbf57082278
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: true
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 0
  swapTotal: 0
  uptime: 5217h 56m 41.00s (Approximately 217.38 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries: {}
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 249
    paused: 0
    running: 209
    stopped: 40
  graphDriverName: zfs
  graphOptions: {}
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 354307145728
  graphRootUsed: 7023755264
  graphStatus:
    Compression: "off"
    Parent Dataset: podman
    Parent Quota: "no"
    Space Available: "347290451968"
    Space Used By Parent: "688870408192"
    Zpool: podman
    Zpool Health: ONLINE
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 603
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.1.1
  Built: 1717640166
  BuiltTime: Thu Jun  6 02:16:06 2024
  GitCommit: ""
  GoVersion: go1.21.11
  Os: linux
  OsArch: linux/amd64
  Version: 5.1.1

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

Luap99 commented 2 weeks ago

We recommend using overlayfs in general. I am not sure if there is anyone actively working on the zfs driver currently.

JakeCooper commented 1 week ago

Does that mean there's no way to limit ephemeral storage with Podman in production?

cgwalters commented 1 week ago

Does that mean there's no way to limit ephemeral storage with Podman in production?

One thing you can do is run your containers with --read-only and only bind mount in external host volumes for any persistence that are limited.

But, another path AFAIK is XFS+quotas, see e.g. https://github.com/containers/podman/discussions/21193 at least.

JakeCooper commented 1 week ago

Seems only overlayfs is maintained as mentioned by chief contributor https://github.com/containers/storage/issues/2004#issuecomment-2214362660

So, I doubt XFS+quotas will be any better (not to mention the inherent downsides of XFS)