evansm7 / vftool

A simple macOS Virtualisation.framework wrapper
MIT License
994 stars 68 forks source link

-p N, N > 1 seems absurdly slow? #14

Open tommythorn opened 3 years ago

tommythorn commented 3 years ago

Thanks for this brilliant tool which is exactly what I wanted. I followed the helpful guide https://github.com/evansm7/vftool/issues/2#issuecomment-735455161 to get a Ubuntu VM which is very fast in single user mode (no -p option), but as soon as I enable more than one core, performance is very very slow.

Is this a known issue?

tommythorn commented 3 years ago

Very funny, that's not the issue (I'm allocating 6 GB and it reproduces with just -p2). So nobody else sees this? That's very odd. I can reproduce it trivially.

Damenly commented 3 years ago

Very funny, that's not the issue (I'm allocating 6 GB and it reproduces with just -p2). So nobody else sees this? That's very odd. I can reproduce it trivially.

+1. Running with 8 CPUs and 8GB RAM but so slow (disk IO?). Reduce it to -p1 then it works smoothly.

cjdell commented 3 years ago

Noticing this too. With 1 CPU it is fine. You can easily see the problem when pinging the NAT gateway (in my case this is 192.168.64.1).

With a single CPU the ping time is < 1ms, but with multiple CPUs it can be often > 100ms. Not disk bound or CPU bound, but more like a synchronisation issue. The problem appears to be fundamental to Apple's virtualisation framework as this phenomenon is also happening in the "SimpleVM" project too.

evansm7 commented 3 years ago

Been discussing this on Twitter – looks like Virtualization.framework is dropping interrupts that aren’t directed at VCPU 0. I can recreate by manually changing the IRQ affinity in sysfs; as an example, IRQ6 was my virtio-net interrupt and I lose network when I direct it at VCPU1 instead of “any”:

echo 1 > /proc/irq/6/smp_affinity_list

My Debian basic installation doesn’t have irqbalanced (or equivalent), so all IRQs remain steered at 0 – but other distros appear to install it by default.

Someone said they had problems with Ubuntu cloud image without irqbalanced, which I have yet to look into. Maybe they have a similar userspace utility, maybe the kernel now has some spicy redirection.

It isn’t a vftool/SimpleVM bug, but a workaround is needed. Feels like a distro-specific tips & tricks discussion?

tommythorn commented 3 years ago

That seems like an ... odd choice of Apple. I apt remove irqbalance and ran for f in /proc/irq/*/smp_affinity_list;do echo 0 > $f;done and it looks like it made things way better. Curiously enough, /proc/irq/1/smp_affinity_list, .., /proc/irq/4/smp_affinity_list can't be written and stay at 0-5 (I ran with -p6, thus the "5"), but it appears to be significantly better than before. Thanks!

I would close, but it seems worthwhile to mention this in the documentation before closing.

seanjensengrey commented 3 years ago

Thanks @tommythorn! I think removing irqbalance is enough, **edit it isn't, see below. I am seeing guest compilation timings for Rust be on par with the M1 host. Previously, using a VM launched with -p 4, cargo took over 7 minutes just to print that it had started compiling the first crate.

irqblance discussion below.

Rust compilation timings

time cargo build --force ripgrep

ubuntu guest with smp_affinity_list changes applied

Launched with -p 4

   Finished release [optimized + debuginfo] target(s) in 31.61s
   Replacing /home/test/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m31.819s
user    1m55.319s
sys     0m2.653s

apple m1 host

   Finished release [optimized + debuginfo] target(s) in 27.25s
   Replacing /Users/seanj/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m27.389s
user    2m39.468s
sys 0m8.593s

With irqbalance installed

root@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
3
0-3
1
0-3

Two minutes and cargo hasn't even finished updating crate index.

time cargo install --force ripgrep
    Updating crates.io index
^C

real    1m57.985s
user    0m0.166s
sys     0m0.038s

After allowing all cores to handle all interrupts

root@ubuntu:~# cat reset_affinity.sh 
#!/bin/bash

# set -eux;

for f in /proc/irq/*/smp_affinity_list;
        do echo "0-3" > $f;
doneroot@ubuntu:~# ./reset_affinity.sh 
./reset_affinity.sh: line 6: echo: write error: Input/output error
./reset_affinity.sh: line 6: echo: write error: Input/output error
./reset_affinity.sh: line 6: echo: write error: Input/output error
./reset_affinity.sh: line 6: echo: write error: Input/output error
root@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
0-3
0-3
0-3
0-3

We see compilation times back to normal.

   Finished release [optimized + debuginfo] target(s) in 31.73s
   Replacing /home/test/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m31.931s
user    1m55.523s
sys     0m2.975s

But! It isn't the configuration, but that act of writing to the smp_affinity_list. Clearing and resetting the irqs to the slowest observed settings still results in a [32s,45s] compile.

With a reconfigured affinity list of

root@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
3
0
1
0

I was still able to get a 35s compile.

    Finished release [optimized + debuginfo] target(s) in 34.36s
   Replacing /home/test/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m35.939s
user    2m0.677s
sys     0m2.327s

The worst configuration I could come up with apart from a fresh boot is

root@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
3
3
3
3

And while the console lags a bunch we still see

    Finished release [optimized + debuginfo] target(s) in 34.81s
   Replacing /home/test/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m43.805s
user    1m55.938s
sys     0m1.976s

It looks like both irqbalance needs to be removed and, the smp_affinity_lists need to get written to, preferably with low numbered cpus.

root@ubuntu:~# cat reset_affinity.sh 
#!/bin/bash

cat /proc/irq/*/smp_affinity_list;

for f in /proc/irq/*/smp_affinity_list;
        do echo "0" > $f;
done

cat /proc/irq/*/smp_affinity_list;
seanjensengrey commented 3 years ago

BTW, when running a guest with -p 8 I am seeing nearly identical rust compilation perf

../vftool/build/vftool -k vmlinux -i initrd -d a_disk1.img -m 2048 -p 8 -a "console=hvc0 root=/dev/vda"

    Finished release [optimized + debuginfo] target(s) in 25.59s
   Replacing /home/test/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m25.815s
user    2m59.086s
sys     0m4.982s

M1 host


    Finished release [optimized + debuginfo] target(s) in 26.39s
   Replacing /Users/seanj/.cargo/bin/rg
    Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)

real    0m26.523s
user    2m38.673s
sys 0m9.794s
gyf304 commented 3 years ago

You can also fix it by adding irqaffinity=0 in the kernel cmdline. irqfixup also seems to work.

https://www.kernel.org/doc/html/v4.14/admin-guide/kernel-parameters.html#:~:text=irqfixup

Before:

ubuntu@ubuntu:~$ sudo hdparm -Tt /dev/vda

/dev/vda:
 Timing cached reads:   17538 MB in  2.00 seconds = 8777.34 MB/sec
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 Timing buffered disk reads:  12 MB in  3.27 seconds =   3.67 MB/sec

After:

ubuntu@ubuntu:~$ sudo hdparm -Tt /dev/vda

/dev/vda:
 Timing cached reads:   45040 MB in  2.00 seconds = 22574.22 MB/sec
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 Timing buffered disk reads: 2238 MB in  3.00 seconds = 745.68 MB/sec

Edit: you will still need to remove irqbalance from your system.

jasmas commented 3 years ago

irqaffinity=0 would probably be the preferred method. This should probably just be documented in the same way as the console.