canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.92k stars 653 forks source link

Bad native mount performance, "Too many open files" errors #2870

Open jrajahalme opened 1 year ago

jrajahalme commented 1 year ago

Describe the bug 1.11 RC introduces support for native mounts. Testing this out compiling a large Go project shows that the native mounts are very slow, and seem to be leaking files as Too many open files errors are seen in the end.

To Reproduce I did not try to simplify repro steps as the errors look like relating to leaking files, which likely would not be seen on a small project. To reproduce:

  1. Clone cilium/packer-ci-build onto directory named 'cilium', where you also have 'cilium' main repo. So after cloning you'll have: cilium/cilium (cloned from https://github.com/cilium/cilium.git) cilium/packer-ci-build (cloned from https://github.com/cilium/packer-ci-build.git)
  2. Change to cilium/packer-ci-build and check out the multipass 1.11 branch:
    $ git checkout pr/jrajahalme/arm64-multipass-1.11.0-rebase
  3. Create VM dev using a native mount (NFS is the default mount type):
    $ MOUNT=native VM_NAME=dev make multipass
  4. Shell into the new VM, check the mount type, and compile cilium:
    $ multipass shell dev
    $ mount -l | grep github
    $ cd go/src/github.com/cilium/cilium
    $ time make

This takes ~50 minutes and produces errors like this:

...
Documentation/cmdref/cilium_fqdn_cache_list.md: Too many open files
Documentation/cmdref/cilium_fqdn_names.md: Too many open files
Documentation/cmdref/cilium_identity.md: Too many open files
...

Expected behavior Be faster than fuse.sshfs or NFS mounts and not produce errors.

For reference:

1.11 RC with NFS mount (MOUNT=NFS on front of make multipass above):

real    6m17.428s
user    10m0.771s
sys 4m42.543s

1.11 RC with fuse.sshfs mount (MOUNT=default on front of make multipass above):

real    19m56.481s
user    10m13.381s
sys 13m52.795s

1.11 RC with native mount (MOUNT=native on front of make multipass above):

real    48m5.397s
user    11m19.672s
sys 84m51.817s

Logs multipassd.log

Additional info

Additional context Shorter test that does not bring up the Too many open files error, but is indicative of the performance, is to run the make build command in the Cilium builder docker image.

Docker Desktop 4.15.0 with VirtioFS file sharing:

$ docker run -it --name cilium-builder --volume $PWD:/home/root/go/src/github.com/cilium/cilium quay.io/cilium/cilium-builder:1b8f97cf70e2a4e7b0609f259d9540523c50cc9f@sha256:0b18c799b88ec039cb0d6d6338ddf539f530319960f9dbc22b69ec6167d3c2ec bash -c "time make -C /home/root/go/src/github.com/cilium/cilium build"
...
real    2m54.322s
user    5m11.642s
sys 2m9.590s

Multipass 1.11 RC with NFS mount:

$ docker run --rm -it --name cilium-builder --volume $PWD:/home/root/go/src/github.com/cilium/cilium quay.io/cilium/cilium-builder:1b8f97cf70e2a4e7b0609f259d9540523c50cc9f@sha256:0b18c799b88ec039cb0d6d6338ddf539f530319960f9dbc22b69ec6167d3c2ec bash -c "git config --global --add safe.directory /home/root/go/src/github.com/cilium/cilium && time make -C /home/root/go/src/github.com/cilium/cilium build"
...
real    2m38.815s
user    5m21.731s
sys 2m13.527s

Multipass 1.11 RC wtih native mount:

$ docker run --rm -it --name cilium-builder --volume $PWD:/home/root/go/src/github.com/cilium/cilium quay.io/cilium/cilium-builder:1b8f97cf70e2a4e7b0609f259d9540523c50cc9f@sha256:0b18c799b88ec039cb0d6d6338ddf539f530319960f9dbc22b69ec6167d3c2ec bash -c "git config --global --add safe.directory /home/root/go/src/github.com/cilium/cilium && time make -C /home/root/go/src/github.com/cilium/cilium build"
...
real    24m40.300s
user    6m10.201s
sys 44m54.779s
townsend2010 commented 1 year ago

Hi @jrajahalme!

Thanks for the report on this! We won't be addressing this issue for the 1.11 release as this feature is still experimental. That said, we will investigate this further and hopefully have it fixed for the 1.12 release.

jrajahalme commented 1 year ago

Not sure, but this may be relevant: https://github.com/qemu/qemu/commit/f5265c8f917ea8c71a30e549b7e3017c1038db63

m-emelchenkov commented 1 year ago

Using bind mounts in Docker to Multipass native mounts directories is definitely x10 slower than should be in its worst case. Disabling native mounts gives ~ x10 speed improvement.

% multipass --version multipass 1.12.0+mac multipassd 1.12.0+mac

townsend2010 commented 1 year ago

Hi @m-emelchenkov!

Would you mind sharing a basic setup that shows this poor behavior so we can have another reproduction case in order to chase down this issue? Thanks!

m-emelchenkov commented 1 year ago

Would you mind sharing a basic setup that shows this poor behavior so we can have another reproduction case in order to chase down this issue? Thanks!

Thank you! Sure! I would like to show you my setup (straightforward setup script and few manual commands), but I don't want to share it on public. Could you please give me your email to send it to you? Or, if you don't want to share email, please mail me at m [at] emelchenkov [dot] pro and I'll reply back.

petitj commented 1 month ago

Hi,

We are also experiencing the same performance issue between classic and native mounts. We've managed to simplify the test. Although the difference is not as huge as with our app, it still shows in a way easily reproducible that performance are worse with native mount.

To Reproduce

  1. multipass launch 20.04 --name perf-classic --mem 1G --disk 5G --cpus 1
  2. multipass launch 20.04 --name perf-native --mem 1G --disk 5G --cpus 1
  3. multipass mount -u 501:1000 -g 20:1000 ~/classic perf-classic:/home/ubuntu/mount
  4. multipass mount -u 501:1000 -g 20:1000 -t native ~/native perf-native:/home/ubuntu/mount
  5. In both VM:
    1. sudo apt-get install iozone3
    2. iozone -t1 -i0 -i2 -r1k -s1g /home/ubuntu/mount/

Expected behavior Performance should be better in native mode than in classic mode.

Additional info

Additional context We tried different ways to show the performance problem (fio, php script dedicated to test i/o...) but only iozone revealed the problem as we see with our app.

ricab commented 1 month ago

Thanks for reporting @petitj, we need to bump this.