canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.71k stars 641 forks source link

Very poor Disk I/O on Apple Silicon after some time. #2440

Open dmarkey opened 2 years ago

dmarkey commented 2 years ago

Describe the bug dd if=/dev/zero of=/foo bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 63.8553 s, 16.8 MB/s

To Reproduce I'm running a kubernetes cluster on my multipass using k3s.. bring up a few apps and wait some time for the disk performance to decrease.

Expected behavior What did you expect to happen? At least 100mb/s transfer speeds on a new M1 mac.

Logs Please provide logs from the daemon, see accessing logs on where to find them on your platform.

Additional info

Additional context Add any other context about the problem here.

townsend2010 commented 2 years ago

Hey @dmarkey,

One thing that jumps out is that the load in the instance is quite high which will definitely affect I/O: Load: 4.41 18.61 16.88

How many cores do you have dedicated to the instance? I would suggest looking at what is running inside the VM to see what is impacting the load so much.

dmarkey commented 2 years ago

So k3s goes a berserk when it cant flush it's etcd database in a timely fashion, hence the high load average, but the dd test was with k3s turned off. I have 2 cores allocated.

Here is the test again, on the exact same VM after a reboot of my laptop, with all the pods running like before:

root@primary:~# dd if=/dev/zero of=/foo bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.40076 s, 447 MB/s

This definitely feels like something that degrades over time.

townsend2010 commented 2 years ago

Ok, thanks.

It's really hard saying what is going on. When you do see the disk write performance degrade, is there anything of note happening in the VM? Like is there a process taking a bunch of time, load averages are high, etc? Also, on the host, is the qemu process taking lots of CPU time or increasing the host load?

Another thing that comes to mind is that we support the TRIM operation in the QCOW2 image via discard=unmap and I wonder if that is running and affecting the write performance?

dmarkey commented 2 years ago

https://gitlab.com/qemu-project/qemu/-/issues/642 seems like a similar upstream issue

-drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-20.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda

How can I change that setting?

townsend2010 commented 2 years ago

You can't do it from Multipass since it's baked into the code. However, you can run the qemu-system-aarch64 command in a terminal and adjust any of the parameters you need- just be sure to escape any spaces in the paths such as:

-drive file=/var/root/Library/Application\ Support/multipassd/qemu/vault/instances/primary/ubuntu-20.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda

Hope this helps and please post any findings that may help. If there are different settings that are needed on Mac, I can change Multipass to use those instead.

townsend2010 commented 2 years ago

I'll also add you'll need to use sudo to run the qemu command.

dmarkey commented 2 years ago

Slight side note. Have ye evaluated supporting https://developer.apple.com/documentation/virtualization on MAC OS?

townsend2010 commented 2 years ago

Slight side note. Have ye evaluated supporting https://developer.apple.com/documentation/virtualization on MAC OS?

We have and we found a few, uh, deficiencies in it.

One is that with Framwork.virtualization, you have to supply a kernel in order to boot the VM much like we do today on x86 Macs using HyperKit. The instance is then stuck using that kernel version which users complained about when using HyperKit. UEFI booting is a feature we want.

Second is that when we started evaluating what to use, we still need to support 10.15 users and Framework.virtualization does not work on that.

Third, we already have experience with QEMU on Linux and it was a natural progression to support QEMU on Mac. We are a small team, so reusing technology is a big plus.

Hope this answers your question.

townsend2010 commented 2 years ago

Oh, Framework.virtualization is really an API abstraction that uses Framework.hypervisor under the hood, so it's possible some of the same issues would remain.