Open tommythorn opened 3 years ago
Very funny, that's not the issue (I'm allocating 6 GB and it reproduces with just -p2). So nobody else sees this? That's very odd. I can reproduce it trivially.
Very funny, that's not the issue (I'm allocating 6 GB and it reproduces with just -p2). So nobody else sees this? That's very odd. I can reproduce it trivially.
+1. Running with 8 CPUs and 8GB RAM but so slow (disk IO?). Reduce it to -p1 then it works smoothly.
Noticing this too. With 1 CPU it is fine. You can easily see the problem when pinging the NAT gateway (in my case this is 192.168.64.1).
With a single CPU the ping time is < 1ms, but with multiple CPUs it can be often > 100ms. Not disk bound or CPU bound, but more like a synchronisation issue. The problem appears to be fundamental to Apple's virtualisation framework as this phenomenon is also happening in the "SimpleVM" project too.
Been discussing this on Twitter – looks like Virtualization.framework is dropping interrupts that aren’t directed at VCPU 0. I can recreate by manually changing the IRQ affinity in sysfs; as an example, IRQ6 was my virtio-net interrupt and I lose network when I direct it at VCPU1 instead of “any”:
echo 1 > /proc/irq/6/smp_affinity_list
My Debian basic installation doesn’t have irqbalanced (or equivalent), so all IRQs remain steered at 0 – but other distros appear to install it by default.
Someone said they had problems with Ubuntu cloud image without irqbalanced, which I have yet to look into. Maybe they have a similar userspace utility, maybe the kernel now has some spicy redirection.
It isn’t a vftool/SimpleVM bug, but a workaround is needed. Feels like a distro-specific tips & tricks discussion?
That seems like an ... odd choice of Apple. I apt remove irqbalance
and ran for f in /proc/irq/*/smp_affinity_list;do echo 0 > $f;done
and it looks like it made things way better. Curiously enough, /proc/irq/1/smp_affinity_list
, .., /proc/irq/4/smp_affinity_list
can't be written and stay at 0-5
(I ran with -p6, thus the "5"), but it appears to be significantly better than before. Thanks!
I would close, but it seems worthwhile to mention this in the documentation before closing.
Thanks @tommythorn! I think removing irqbalance
is enough, **edit it isn't, see below. I am seeing guest compilation timings for Rust be on par with the M1 host. Previously, using a VM launched with -p 4
, cargo took over 7 minutes just to print that it had started compiling the first crate.
irqblance
discussion below.
time cargo build --force ripgrep
smp_affinity_list
changes appliedLaunched with -p 4
Finished release [optimized + debuginfo] target(s) in 31.61s
Replacing /home/test/.cargo/bin/rg
Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)
real 0m31.819s
user 1m55.319s
sys 0m2.653s
Finished release [optimized + debuginfo] target(s) in 27.25s
Replacing /Users/seanj/.cargo/bin/rg
Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)
real 0m27.389s
user 2m39.468s
sys 0m8.593s
irqbalance
installedroot@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
3
0-3
1
0-3
Two minutes and cargo hasn't even finished updating crate index.
time cargo install --force ripgrep
Updating crates.io index
^C
real 1m57.985s
user 0m0.166s
sys 0m0.038s
After allowing all cores to handle all interrupts
root@ubuntu:~# cat reset_affinity.sh
#!/bin/bash
# set -eux;
for f in /proc/irq/*/smp_affinity_list;
do echo "0-3" > $f;
doneroot@ubuntu:~# ./reset_affinity.sh
./reset_affinity.sh: line 6: echo: write error: Input/output error
./reset_affinity.sh: line 6: echo: write error: Input/output error
./reset_affinity.sh: line 6: echo: write error: Input/output error
./reset_affinity.sh: line 6: echo: write error: Input/output error
root@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
0-3
0-3
0-3
0-3
We see compilation times back to normal.
Finished release [optimized + debuginfo] target(s) in 31.73s
Replacing /home/test/.cargo/bin/rg
Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)
real 0m31.931s
user 1m55.523s
sys 0m2.975s
But! It isn't the configuration, but that act of writing to the smp_affinity_list
. Clearing and resetting the irqs to the slowest observed settings still results in a [32s,45s] compile.
With a reconfigured affinity list of
root@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
3
0
1
0
I was still able to get a 35s compile.
Finished release [optimized + debuginfo] target(s) in 34.36s
Replacing /home/test/.cargo/bin/rg
Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)
real 0m35.939s
user 2m0.677s
sys 0m2.327s
The worst configuration I could come up with apart from a fresh boot is
root@ubuntu:~# cat /proc/irq/*/smp_affinity_list
0-3
0-3
0-3
0-3
3
3
3
3
And while the console lags a bunch we still see
Finished release [optimized + debuginfo] target(s) in 34.81s
Replacing /home/test/.cargo/bin/rg
Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)
real 0m43.805s
user 1m55.938s
sys 0m1.976s
It looks like both irqbalance
needs to be removed and, the smp_affinity_list
s need to get written to, preferably with low numbered cpus.
root@ubuntu:~# cat reset_affinity.sh
#!/bin/bash
cat /proc/irq/*/smp_affinity_list;
for f in /proc/irq/*/smp_affinity_list;
do echo "0" > $f;
done
cat /proc/irq/*/smp_affinity_list;
BTW, when running a guest with -p 8
I am seeing nearly identical rust compilation perf
../vftool/build/vftool -k vmlinux -i initrd -d a_disk1.img -m 2048 -p 8 -a "console=hvc0 root=/dev/vda"
Finished release [optimized + debuginfo] target(s) in 25.59s
Replacing /home/test/.cargo/bin/rg
Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)
real 0m25.815s
user 2m59.086s
sys 0m4.982s
M1 host
Finished release [optimized + debuginfo] target(s) in 26.39s
Replacing /Users/seanj/.cargo/bin/rg
Replaced package `ripgrep v12.1.1` with `ripgrep v12.1.1` (executable `rg`)
real 0m26.523s
user 2m38.673s
sys 0m9.794s
You can also fix it by adding irqaffinity=0
in the kernel cmdline. irqfixup
also seems to work.
https://www.kernel.org/doc/html/v4.14/admin-guide/kernel-parameters.html#:~:text=irqfixup
Before:
ubuntu@ubuntu:~$ sudo hdparm -Tt /dev/vda
/dev/vda:
Timing cached reads: 17538 MB in 2.00 seconds = 8777.34 MB/sec
HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
Timing buffered disk reads: 12 MB in 3.27 seconds = 3.67 MB/sec
After:
ubuntu@ubuntu:~$ sudo hdparm -Tt /dev/vda
/dev/vda:
Timing cached reads: 45040 MB in 2.00 seconds = 22574.22 MB/sec
HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
Timing buffered disk reads: 2238 MB in 3.00 seconds = 745.68 MB/sec
Edit: you will still need to remove irqbalance
from your system.
irqaffinity=0 would probably be the preferred method. This should probably just be documented in the same way as the console.
Thanks for this brilliant tool which is exactly what I wanted. I followed the helpful guide https://github.com/evansm7/vftool/issues/2#issuecomment-735455161 to get a Ubuntu VM which is very fast in single user mode (no -p option), but as soon as I enable more than one core, performance is very very slow.
Is this a known issue?