containers / storage

Container Storage Library
Apache License 2.0
558 stars 240 forks source link

zfs driver: extremely slow podman ps #2004

Open JakeCooper opened 3 months ago

JakeCooper commented 3 months ago

Issue Description

Running with the zfs driver and about 200 containers on host, the driver becomes extremely slow. We're talking 30s to do podman ps

Steps to reproduce the issue

Steps to reproduce the issue

  1. Run podman with about 200 containers and the zfs driver
  2. Do podman ps

Describe the results you received

30s to do podman ps

Describe the results you expected

pseudo instant (like the ext4 overlay one)

podman info output

host:
  arch: amd64
  buildahVersion: 1.36.0
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: Unknown
    path: /usr/local/bin/conmon
    version: 'conmon version 2.1.12, commit: e8896631295ccb0bfdda4284f1751be19b483264-dirty'
  cpuUtilization:
    idlePercent: 66.22
    systemPercent: 11.5
    userPercent: 22.28
  cpus: 32
  databaseBackend: sqlite
  distribution:
    codename: bookworm
    distribution: debian
    version: "12"
  eventLogger: journald
  freeLocks: 65277
  hostname: production-stacker-178
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.1.0-13-cloud-amd64
  linkmode: dynamic
  logDriver: journald
  memFree: 24015925248
  memTotal: 270471868416
  networkBackend: cni
  networkBackendInfo:
    backend: cni
    dns: {}
  ociRuntime:
    name: crun
    package: Unknown
    path: /usr/local/bin/crun
    version: |-
      crun version 1.15
      commit: e6eacaf4034e84185fd8780ac9262bbf57082278
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: true
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 0
  swapTotal: 0
  uptime: 5217h 56m 41.00s (Approximately 217.38 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries: {}
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 249
    paused: 0
    running: 209
    stopped: 40
  graphDriverName: zfs
  graphOptions: {}
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 354307145728
  graphRootUsed: 7023755264
  graphStatus:
    Compression: "off"
    Parent Dataset: podman
    Parent Quota: "no"
    Space Available: "347290451968"
    Space Used By Parent: "688870408192"
    Zpool: podman
    Zpool Health: ONLINE
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 603
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.1.1
  Built: 1717640166
  BuiltTime: Thu Jun  6 02:16:06 2024
  GitCommit: ""
  GoVersion: go1.21.11
  Os: linux
  OsArch: linux/amd64
  Version: 5.1.1

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

rhatdan commented 3 months ago

This zfs driver is only supported via upstream contributors. The core team only works on Overlay and VFS.

JakeCooper commented 2 months ago

Then you should remove it from the project, or add a big fat "We don't maintain this" sign

The presence of a storage driver (pretty critical no?) would imply that it actually works. This doesn't work for production usecases.

rhatdan commented 2 months ago

Well it works for some, even if the performance is not up to your requirements. If you would like to improve it, PRs welcome.

JakeCooper commented 2 months ago

It also seems to leak layers like a sieve, which is arguably a bigger issue

https://github.com/containers/storage/issues/2005#issuecomment-2221814300