containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.27k stars 2.37k forks source link

[Bug]: Programs stored on bind mounted volumes are unusably slow on MacOS due to disk stat #16994

Closed nettybun closed 3 months ago

nettybun commented 1 year ago

Issue Description

For context I'm developing a Node.js app on MacOS 12.6 Monterey which is running a Podman 4.3.1 (with its QEMU VM set to security_model=none to allow symlinks to work since those are used everywhere in Node.js' node_modules dependency folders). It mostly works.

Unfortunately the time to stat items in the volume mount is show-stoppingly slow. For JS deps this means it's several seconds to just print a version number like npx jest -v since the Node ecosystem is a bunch of tiny interconnected modules on disk so it's A LOT of scanning around.

I think it's stat-specifically because read/write seems fine: I tried testing read/write with dd if=/dev/zero of=tempfile bs=1M count=1024 and dd if=tempfile of=/dev/null bs=1M count=1024 no issue 🤷

Here's some perf tests showing the speed difference inside and outside of the mounted folder. Note pwd is /Users/Hames/Repos/SomeProject

Test 1: Real usecase with JS/Node/etc.

Setup

$ podman run -v $(pwd):/app -it node:16 /bin/bash
root@1e9b16250a4f:/# cd /tmp; npm i jest@26.6.3

Normal folder /tmp

root@1e9b16250a4f:/tmp# time npx jest -v
26.6.3
real    0m1.106s
user    0m0.465s
sys 0m0.721s
root@1e9b16250a4f:/tmp# time npx jest -v
26.6.3
real    0m1.107s
user    0m0.490s
sys 0m0.703s
root@1e9b16250a4f:/tmp# time npx jest -v
26.6.3
real    0m1.173s
user    0m0.498s
sys 0m0.767s

Volume mounted folder /app

root@1e9b16250a4f:/app# time npx jest -v
26.6.3
real    0m7.693s
user    0m0.552s
sys 0m1.032s
root@1e9b16250a4f:/app# time npx jest -v
26.6.3
real    0m8.034s
user    0m0.574s
sys 0m1.187s
root@1e9b16250a4f:/app# time npx jest -v
26.6.3
real    0m7.992s
user    0m0.564s
sys 0m1.058s

Test 2: More minimal reproduction case

Setup

$ podman run -v $(pwd):/app -it node:16 /bin/bash
root@59dd194a0df8:/# cd /tmp; npm i jest@26.6.3
root@59dd194a0df8:/tmp# ls -l node_modules/
Display all 350 possibilities? (y or n)
root@59dd194a0df8:/tmp# ls -l node_modules/jest
jest/                      jest-docblock/             jest-haste-map/            jest-mock/                 jest-runner/               jest-validate/
jest-changed-files/        jest-each/                 jest-jasmine2/             jest-pnp-resolver/         jest-runtime/              jest-watcher/
jest-cli/                  jest-environment-jsdom/    jest-leak-detector/        jest-regex-util/           jest-serializer/           jest-worker/
jest-config/               jest-environment-node/     jest-matcher-utils/        jest-resolve/              jest-snapshot/
jest-diff/                 jest-get-type/             jest-message-util/         jest-resolve-dependencies/ jest-util/

Normal folder /tmp

root@59dd194a0df8:/tmp# time ls -l node_modules/jest*/** | wc -l
617
real    0m0.014s
user    0m0.004s
sys 0m0.012s

Volume mounted folder /app

root@59dd194a0df8:/app# time ls -l node_modules/jest*/** | wc -l
617
real    0m7.306s
user    0m0.001s
sys 0m0.413s

Test 3: Copy from volume to normal

My laptop battery dropped 3% running this command 😶...maybe a coincidence?

root@59dd194a0df8:/tmp# time cp -R /app/node_modules/jest* node_modules/
real    1m23.538s
user    0m0.044s
sys 0m4.531s
root@59dd194a0df8:/tmp# du -sh node_modules/
45M node_modules/
root@59dd194a0df8:/tmp# du -sh node_modules/*
24K node_modules/jest
140K    node_modules/jest-canvas-mock
240K    node_modules/jest-changed-files
692K    node_modules/jest-cli
292K    node_modules/jest-config
216K    node_modules/jest-diff
20K node_modules/jest-docblock
64K node_modules/jest-each
3.7M    node_modules/jest-environment-jsdom
3.7M    node_modules/jest-environment-node
160K    node_modules/jest-fetch-mock
16K node_modules/jest-get-type
3.9M    node_modules/jest-haste-map
3.9M    node_modules/jest-image-snapshot
3.9M    node_modules/jest-jasmine2
20K node_modules/jest-leak-detector
48K node_modules/jest-matcher-utils
36K node_modules/jest-message-util
3.7M    node_modules/jest-mock
24K node_modules/jest-pnp-resolver
16K node_modules/jest-regex-util
84K node_modules/jest-resolve
20K node_modules/jest-resolve-dependencies
3.7M    node_modules/jest-runner
748K    node_modules/jest-runtime
3.7M    node_modules/jest-serializer
412K    node_modules/jest-snapshot
40K node_modules/jest-specific-snapshot
3.8M    node_modules/jest-util
128K    node_modules/jest-validate
3.8M    node_modules/jest-watcher
3.8M    node_modules/jest-worker

45MB in 1m23s?... ???

Any pointers on how to improve disk I/O is appreciated.

Steps to reproduce the issue

Steps to reproduce the issue 1. 2. 3.

Describe the results you received

Describe the results you received

Describe the results you expected

Describe the results you expected

podman info output

❯ podman version
Client:       Podman Engine
Version:      4.3.1
API Version:  4.3.1
Go Version:   go1.18.8
Built:        Wed Nov  9 12:43:58 2022
OS/Arch:      darwin/amd64

Server:       Podman Engine
Version:      4.3.1
API Version:  4.3.1
Go Version:   go1.19.2
Built:        Fri Nov 11 07:01:27 2022
OS/Arch:      linux/amd64

❯ podman info
host:
  arch: amd64
  buildahVersion: 1.28.0
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.5-1.fc37.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.5, commit: '
  cpuUtilization:
    idlePercent: 99.71
    systemPercent: 0.16
    userPercent: 0.14
  cpus: 4
  distribution:
    distribution: fedora
    variant: coreos
    version: "37"
  eventLogger: journald
  hostname: localhost.localdomain
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.0.15-300.fc37.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 999780352
  memTotal: 4107599872
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.7.2-2.fc37.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.7.2
      commit: 0356bf4aff9a133d655dc13b1d9ac9424706cac4
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-8.fc37.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 3h 50m 5.00s (Approximately 0.12 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 0
    stopped: 2
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 106769133568
  graphRootUsed: 8393973760
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 2
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.3.1
  Built: 1668178887
  BuiltTime: Fri Nov 11 07:01:27 2022
  GitCommit: ""
  GoVersion: go1.19.2
  Os: linux
  OsArch: linux/amd64
  Version: 4.3.1

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

This is a QEMU issue because the same behaviour happens in podman machine ssh i.e outside of a "container".

nettybun commented 1 year ago

This is also seen in tab-complete speed:

https://user-images.githubusercontent.com/44614862/210662911-427cade2-6e60-4786-8be8-77225f27151c.mp4

nettybun commented 1 year ago

Ok I'm not convinced this is something podman devs can control - it may simply be the state of the art of QEMU right now for macos. I tested this in Macpine too which also uses 9p virtfs mounting:

https://github.com/beringresearch/macpine/blob/f4333b723d6b995be4d0246b5633962d3c22268e/qemu/ops.go#L285-L286

The performance is just as slow in alpine ssh as podman machine.

image

In Macpine repo the author said SSHFS might be more efficient. https://github.com/beringresearch/macpine/issues/40#issuecomment-1328348718. Not sure. I've seen really really fast internet downloads to QEMU's disk (apk add, curl wget, etc) so that's promising. I have a feeling RSYNC'ing the changes (or syncthing?) as if I was talking to a server (yes it's 2023 haha) might be best since it'd prevent devs from mistaking the disk as "cheap"...

Feel free to close if it's unactionable. Still, this is very much an issue with using podman on macos today. I only use it for work - on my personal Linux there are no perf issues :)

rhatdan commented 1 year ago

I think the best solution for this would be to move to virtiofsd and perhaps native mac virtualization.

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 1 year ago

This effort is still happening. @baude Update?

brendon commented 1 year ago

Hilariously I get much better performance by using Parallels and spinning up a linux VM, then hosting Podman in there while mounting my development directory into the VM. I guess Parallels does something different to get the Mac filesystem into the VM that doesn't have the same slowness?

nettybun commented 1 year ago

That's hilarious thanks for sharing! I've since given up on macOS for my work computer and moved to Linux, where the problem doesn't exist.

Maybe someday I'll convince my coworkers to try Asahi on their Macs 🤷 Thanks for the effort and good luck! Feel free to close if needed.

brendon commented 1 year ago

Haha! Yea, it looks like Parallels have their own driver that gets installed when you spin up their flavour of Ubuntu etc... The filesystem type is pfs. So far it's been working really well and is really no different to the VM Podman uses on the Mac I guess (apart from the slowness). At some point I'll configure Podman Desktop to connect to the Parallels VM but to be fair, it's easy enough to use within the VM window anyway.

jfrantzius commented 10 months ago

Hi, we have the same observation here for Podman on Mac (both amd64 and aarch64): bind mounts are very slow for us. Our Java process in the Docker container reads a lot of YAML configuration files, and that takes 3-10 times longer when reading from a bind mount than when the files are baked into the Docker image. This problem doesn't exist with Docker on Mac, so I guess there is hope that this can be improved :)

I compared file access times by simply reading all YAML files using cat, issuing the following on the same folder from my host, first in zsh on the Mac host, and then via podman ssh in the Podman machine (where the same host folder can be found below /Users/): Directly via zsh:

time find light-modules/ -type f -name "*.yaml" -exec cat {} >/dev/null \; 2>/dev/null
find light-modules/ -type f -name "*.yaml" -exec cat {} \; > /dev/null 2>   0.10s user 0.40s system 42% cpu 1.170 total

From within Podman machine:

[root@localhost magnolia]# time find light-modules/ -type f -name "*.yaml" -exec cat {} >/dev/null \; 2>/dev/null

real    0m1.154s
user    0m0.058s
sys 0m0.119s

I'm not entirely sure how to compare the different outputs of time on Mac vs. Linux, but in sum that looks like 0.5s vs. 1.273s, which would be factor 2.546 slower. So just reading the same files from Podman machine (not from a container) already seems to be much slower than local host file access.

Could this be due to QEMU vs. Hyperkit with regards to how folders are shared ?

jfrantzius commented 10 months ago

Hi @rhatdan , are there per chance any efforts under way by the Podman team to solve this problem? Tim deBoer told me in https://kubernetes.slack.com/archives/C04A0L7LUFM/p1699377715926089 that the Podman team would have a look if I comment in this issue :)

rhatdan commented 10 months ago

There is an effort to move to native apple hypervisor, which should switch from plan 9 file system to virtiofsd, which should improve the remote file system support.

@baude WDYT?

jfrantzius commented 10 months ago

Linking related issues: RFE: Native Mac/Windows support #8452 : this is closed, even though seemingly it could rely on Fedora Platform Request - Apple native hypervisor #1533 and Fedora Add Platform: Native Apple Hypervisor #1548, which are in progress?

@baude the article https://www.redhat.com/sysadmin/podman-mac-machine-architecture states that Podman is built on "QEMU plus HVF acceleration: Runs a virtualized Linux distribution using native macOS virtualization", but that seems to still be in progress with Platform Request - Apple native hypervisor #1533 , and 13 out of 44 tasks done in Add Platform: Native Apple Hypervisor #1548 ?

Luap99 commented 3 months ago

virtiofs is now uses with podman 5.9 and applehv so it is much faster