Closed 3hhh closed 4 months ago
The system is identical, except the hard disk: btrfs/reflink is installed on a newer Sandisk Ultra 3D SSD (ext4/file on some old Lenovo standard SSD).
Can you do a file-vs-reflink benchmark on the same hardware? Some new SSDs have quite slow write performance due to using TLC or even QLC.
Can you do a file-vs-reflink benchmark on the same hardware? Some new SSDs have quite slow write performance due to using TLC or even QLC.
You mean from exactly the same hard disk?
That would require quite a lot of dd
'ing & time, but let's see... Anyway I'd do it at most once...
Hm so I decided to try a btrfs/file pool first, but that one had the same performance issues on the newer SSD. Around 24-32s for time qvm-run -p vm 'echo hello'
on 4.1rc2.
hdparm
reports write speeds of around 500 MB/s, dd
reports 250 MB/s to a btrfs file on that disk. Read speed is 5-8 GB/s according to hdparm
.
Could it be a system wide R4.1 performance issue unrelated to storage (or more precisely, only correlated to storage via dm-crypt)? On an early prealpha I was affected by probably some variant of #4604. Everything was dog slow because my CPU stuck was stuck at the lowest frequency.
It might be worth it to try an ext4/file R4.1 installation for comparison if you have a spare drive.
Grasping for straws: If it's truly a storage issue, maybe the drive is completely starved for available blocks so that blkdiscard
and then reinstalling would help.
XFS/reflink is another stack to try. XFS is known for high performance.
maybe the drive is completely starved for available blocks
On second thought, I think this can be ruled out as you're still getting 250 MB/s writes. That number seems low (if the dd
benchmark is in dom0 and with a large enough bs=
argument) - but not terrible.
x220 with Samsung EVO 860
btrfs
So I tested the fresh 4.1rc2 btrfs/reflink on the old SSD:
time qvm-run -p vm 'echo hello'
: 24-30s
dd
write speed to a btrfs file: 200 MB/s
So now that's two 100% identical systems, one with 4.1 btrfs/reflink and one with 4.0 ext4/file. I'll try 4.0 ext4/file on the new SSD as well.
@0xdd7fq Do you run reflink or lvm? Please check your qvm-pool info varlibqubes
driver output...
@3hhh
default btrfs install on x220 with 1.5G dom0 mem, imagine speed on thinkpad nvme mod...
driver : file-reflink
btrfs fi sh
Label: 'qubes_dom0' uuid: xxx
Total devices 1 FS bytes used 25.34GiB
devid 1 size 926.55GiB used 28.02GiB path /dev/mapper/luks-xxx
What about XFS/reflink?
the old SSD
can it be painfully slow DISCARD op handling by the disk?
So I now tried 4.0 ext4/file on the new disk:
time qvm-run -p vm 'echo hello'
: 10-11s
dd
write speed to an ext4 file: 220 MB/s
So the type of disk doesn't matter / it's unlikely a hardware issue.
I'll try upgrading the 4.0 ext4/file install to 4.1 to see what happens.
Is there any way to test discards?
I'll try upgrading the 4.0 ext4/file install to 4.1 to see what happens.
Forget about that. It currently quite specatularly fails after a second for ext4/file pools:
qubes-dist-upgrade --all
WARNING: /!\ ENSURE TO HAVE MADE A BACKUP OF ALL YOUR VMs AND dom0 DATA /!\
-> Launch upgrade process? [y/N] y
Failed to find logical volume "qubes_dom0/_tmeta"
Failed to find logical volume "qubes_dom0/"
/sbin/qubes-dist-upgrade: line 417: [: : integer expression expected
/sbin/qubes-dist-upgrade: line 576: [: : integer expression expected
---> (STAGE 0) Adjusting LVM thin pool metadata size...
Failed to find logical volume "qubes_dom0/"
/sbin/qubes-dist-upgrade: line 417: [: : integer expression expected
--size may not be zero.
Run `lvextend --help' for more information.
-> Launch restoration...
Apparently qubes-dist-upgrade
silently assumes lvm only.
EDIT: Ah, it seems to be stage0, will try to skip that one...
Giving up on the upgrade approach as qubes-dist-upgrade
broke the system. I'll mention the issues at #5685 .
What about XFS/reflink?
Does it work from the installer now?
Last time I tried I had a really hard time to get the anaconda custom partitioning right for btrfs/reflink. I recall my first attempt resulted in a btrfs/lvm install without encryption and I finally chose the default btrfs/reflink option...
Anyway in total my previous results indicate it's either a btrfs, a reflink driver or a generic Qubes 4.1 issue.
Apparently however only a certain subset of users is affected.
As for the start time itself, there may be a fallout from the #7023 fix, affecting start time regardless of storage. Try starting a VM with memory==maxmem (for example set both to 600).
What about XFS/reflink?
Does it work from the installer now?
Yes, since R4.0.4 and R4.1.0-beta1 the installer will create /var/lib/qubes on XFS when you switch to the "Standard Partition" scheme + "Click here to create them automatically." (It's also still possible to then change the /var/lib/qubes partition to ext4 under "File System:" if you want to test with a legacy 'file' driver pool.)
Last time I tried I had a really hard time to get the anaconda custom partitioning right for btrfs/reflink.
Anything except the automatic schemes is a minefield for sure.
As for the start time itself, there may be a fallout from the #7023 fix, affecting start time regardless of storage. Try starting a VM with memory==maxmem (for example set both to 600).
No, unfortunately that's not it. All of my tests incl. 4.0 were with mem=1500, maxmem=0. I just tested with maxmem=mem=1500 on 4.1 btrfs/reflink and the results are identical (~30s startup).
Hmmm so I masochistically decided to test all pool types.
I installed plain Qubes 4.1rc2 4 times on the same machine with only debian-11 and fedora-34 templates and then ran
time qvm-run -p debian-11/fedora-34
: All behaved similary at 11-14s startup times at ~260 MB/s dd
write speeds.
(xfs/reflink and ext4/file on average were a second or so better than btrfs/reflink and lvm thin).
So it's not a btrfs/reflink specific issue. On the contrary it indicates that my usage pattern must cause the issue.
I also recalled that I had noticed slight swapping in dom0 of 50-150MB even though my dom0 memory was at 30% of 3 GB (I had changed that from the default 4 GB) and thus disabled swapping (why does it do swapping at 30%?).
That now reduced startup times to 19-25s, but still not the 11-14s of an empty installation (on the same machine, older SSD though). I'll test whether startup times are better for the first VM to be started...
I'll test whether startup times are better for the first VM to be started...
Bullseye!
After qvm-shutdown --all
I get 14-15s
on btrfs/reflink on my fully installed machine as well. Usually however I had ~12 VMs running when I tested. I guess @0xdd7fq tested with less.
So Qubes 4.1rc2 btrfs/reflink startup performance degrades with a higher number of running VMs. That didn't happen in 4.0 ext4/file. Still not sure whether it affects btrfs/reflink only or is a generic issue independent of the storage driver. I still have the lvm thin empty install and check on that...
@3hhh can you check with xfs/reflink too?
lvm thin is affected: I created and ran 13 VMs one after another on the empty 4.1rc2 install and startup times slowed down from 13s to 22s for the last VM.
So this should be a generic issue. I'll update title & description accordingly and leave the Xen fixing to you guys. ;-)
@3hhh @DemiMarie
ok, i have second machine with xeon and 64gb ram, 5.14 kernel, btrfs, ssd 860 evo, not nvme. 10 ident standalone gentoo gnome vm ( vCpu=2 memory=400 maxmem=4096) typical ssd - good only for mousless dev, for xen we need nvme replace it with 960-970 nvme - results will be x2 faster best performance is 5-7s, not 13-15s
[toothless@dom0 ~]$ time qvm-run -a test1 gnome-terminal
Running 'gnome-terminal' on test1
real 0m13.703s
user 0m0.061s
sys 0m0.055s
[toothless@dom0 ~]$ time qvm-run -a test2 gnome-terminal
Running 'gnome-terminal' on test2
real 0m13.705s
user 0m0.103s
sys 0m0.052s
[toothless@dom0 ~]$ time qvm-run -a test3 gnome-terminal
Running 'gnome-terminal' on test3
real 0m14.368s
user 0m0.080s
sys 0m0.060s
[toothless@dom0 ~]$ time qvm-run -a test4 gnome-terminal
Running 'gnome-terminal' on test4
real 0m14.369s
user 0m0.080s
sys 0m0.054s
[toothless@dom0 ~]$ time qvm-run -a test5 gnome-terminal
Running 'gnome-terminal' on test5
real 0m14.276s
user 0m0.058s
sys 0m0.053s
[toothless@dom0 ~]$ time qvm-run -a test6 gnome-terminal
Running 'gnome-terminal' on test6
real 0m14.531s
user 0m0.064s
sys 0m0.059s
[toothless@dom0 ~]$ time qvm-run -a test7 gnome-terminal
Running 'gnome-terminal' on test7
real 0m14.635s
user 0m0.061s
sys 0m0.060s
[toothless@dom0 ~]$ time qvm-run -a test8 gnome-terminal
Running 'gnome-terminal' on test8
real 0m14.773s
user 0m0.066s
sys 0m0.055s
[toothless@dom0 ~]$ time qvm-run -a test9 gnome-terminal
Running 'gnome-terminal' on test9
real 0m15.812s
user 0m0.054s
sys 0m0.059s
[toothless@dom0 ~]$ time qvm-run -a test10 gnome-terminal
Running 'gnome-terminal' on test10
real 0m15.352s
user 0m0.067s
sys 0m0.053s
[toothless@dom0 ~]$ time qvm-shutdown test10
real 0m0.196s
user 0m0.090s
sys 0m0.044s
[toothless@dom0 ~]$ time qvm-shutdown test9
real 0m0.378s
user 0m0.081s
sys 0m0.015s
[toothless@dom0 ~]$ time qvm-shutdown test8
real 0m0.434s
user 0m0.076s
sys 0m0.030s
[toothless@dom0 ~]$ time qvm-shutdown test7
real 0m0.334s
user 0m0.066s
sys 0m0.030s
[toothless@dom0 ~]$ time qvm-shutdown test6
real 0m0.150s
user 0m0.066s
sys 0m0.030s
[toothless@dom0 ~]$ time qvm-shutdown test5
real 0m0.138s
user 0m0.068s
sys 0m0.024s
[toothless@dom0 ~]$ time qvm-shutdown test4
real 0m0.343s
user 0m0.072s
sys 0m0.022s
[toothless@dom0 ~]$ time qvm-shutdown test3
real 0m0.167s
user 0m0.078s
sys 0m0.032s
[toothless@dom0 ~]$ time qvm-shutdown test2
real 0m0.187s
user 0m0.089s
sys 0m0.038s
[toothless@dom0 ~]$ time qvm-shutdown test1
real 0m0.182s
user 0m0.074s
sys 0m0.045s
[toothless@dom0 ~]$ qvm-volume extend test10:root 500G
[toothless@dom0 ~]$ time qvm-run -a test10 gnome-terminal
Running 'gnome-terminal' on test10
real 0m16.769s
user 0m0.061s
sys 0m0.053s
[toothless@dom0 ~]$ time qvm-shutdown test10
real 0m0.157s
user 0m0.080s
sys 0m0.033s
@0xdd7fq Ty for testing. You're apparently also affected by this bug with 13.7s --> 15.3s linearly increasing on each new VM, albeit significantly less compared to my 13s --> 22s. Btw shutdown timings are pointless without --wait
.
@DemiMarie It's all the same on xfs/reflink and ext4/file. This doesn't appear to be related to storage drivers.
Output from 4.0 for comparison after a qvm-shutdown --all
:
Testing tvm-1...
hello
real 0m13.664s
user 0m0.070s
sys 0m0.052s
Testing tvm-2...
hello
real 0m12.578s
user 0m0.067s
sys 0m0.068s
Testing tvm-3...
hello
real 0m13.391s
user 0m0.068s
sys 0m0.062s
Testing tvm-4...
hello
real 0m14.169s
user 0m0.071s
sys 0m0.057s
Testing tvm-5...
hello
real 0m12.710s
user 0m0.063s
sys 0m0.060s
Testing tvm-6...
hello
real 0m13.313s
user 0m0.072s
sys 0m0.069s
Testing tvm-7...
hello
real 0m13.165s
user 0m0.084s
sys 0m0.053s
Testing tvm-8...
hello
real 0m13.449s
user 0m0.070s
sys 0m0.058s
Testing tvm-9...
hello
real 0m15.725s
user 0m0.070s
sys 0m0.069s
Testing tvm-10...
hello
real 0m16.589s
user 0m0.080s
sys 0m0.054s
Testing tvm-11...
hello
real 0m16.229s
user 0m0.067s
sys 0m0.059s
Testing tvm-12...
hello
real 0m13.500s
user 0m0.067s
sys 0m0.063s
Testing tvm-13...
hello
real 0m16.592s
user 0m0.065s
sys 0m0.047s
(Doesn't look fully linear to me & at least doesn't increase to 22s).
perf diff
on cpu-clock:pppH
doesn't show any increase. What could it be then? Any ideas for perf
events to monitor?
I'd still guess this is related to memory pressure in some way. Maybe obtaining initial memory for the new VM to start? This can be checked by ensuring sum(maxmem) for running VMs is less than total physical memory. If in such conditions start time is the same regardless of the number of VMs, then that's it. Otherwise we need to dig deeper.
You were somewhat right - going over the maximum physical memory has an effect, albeit only a small one.
Once I statically set the newly created Qubes to 400M, 4.0 started them up in rather constant 10 seconds, whereas 4.1 started with 21 seconds (!) and went up to 33 seconds. So the issue became even more grave for 4.1 and now I'm even not sure anymore whether it's a single issue anymore, because
a) Timings shouldn't start so badly. b) Timings shouldn't increase linearly.
Maybe some locks were introduced in 4.1 that were not in place in 4.0? Unfortunately perf lock record
complains about some missing tracepoints.
Is the 4.0 Xen version 4.8.5, kernel 5.4.143 available for 4.1? If yes, one could rule out an upstream Xen issue.
Hardware is a T530 with i7-3720QM and 16 GB RAM btw.
I attached the different outputs of the following script (below for mem=maxmem=400):
#!/bin/bash
qvm-shutdown --all --wait
sudo swapoff -a
swap="$(sudo mount|grep swap)"
[ -z "$swap" ] || { echo "Please disable swap first." ; exit 2 ; }
for ((i=0;i<13;i++)) ; do
vm="tvm-$i"
qvm-remove -f "$vm"
done
for ((i=0;i<13;i++)) ; do
vm="tvm-$i"
echo "Creating $vm..."
qvm-create -l red "$vm"
qvm-prefs "$vm" netvm ""
qvm-prefs "$vm" memory 400
qvm-prefs "$vm" maxmem 0
echo "Starting $vm..."
#beg=$SECONDS
#sudo perf record -a --call-graph fp -o "${vm}_perf.data" qvm-run -p "$vm" 'echo hello'
#echo "duration: $(($SECONDS - $beg))"
time qvm-run -p "$vm" 'echo hello'
xl info | grep free_memory
done
qvm-shutdown --all --wait
for ((i=0;i<13;i++)) ; do
vm="tvm-$i"
qvm-remove -f "$vm"
done
4.1, maxmem=400:
Creating tvm-0...
Starting tvm-0...
hello
real 0m21.511s
user 0m0.075s
sys 0m0.069s
free_memory : 12662
Creating tvm-1...
Starting tvm-1...
hello
real 0m21.298s
user 0m0.066s
sys 0m0.072s
free_memory : 12255
Creating tvm-2...
Starting tvm-2...
hello
real 0m21.324s
user 0m0.071s
sys 0m0.072s
free_memory : 11848
Creating tvm-3...
Starting tvm-3...
hello
real 0m22.316s
user 0m0.071s
sys 0m0.064s
free_memory : 11442
Creating tvm-4...
Starting tvm-4...
hello
real 0m23.233s
user 0m0.065s
sys 0m0.075s
free_memory : 11035
Creating tvm-5...
Starting tvm-5...
hello
real 0m23.769s
user 0m0.075s
sys 0m0.069s
free_memory : 10628
Creating tvm-6...
Starting tvm-6...
hello
real 0m25.669s
user 0m0.090s
sys 0m0.073s
free_memory : 10222
Creating tvm-7...
Starting tvm-7...
hello
real 0m28.193s
user 0m0.086s
sys 0m0.061s
free_memory : 9815
Creating tvm-8...
Starting tvm-8...
hello
real 0m31.331s
user 0m0.082s
sys 0m0.079s
free_memory : 9408
Creating tvm-9...
Starting tvm-9...
hello
real 0m30.918s
user 0m0.083s
sys 0m0.064s
free_memory : 9002
Creating tvm-10...
Starting tvm-10...
hello
real 0m29.939s
user 0m0.070s
sys 0m0.070s
free_memory : 8595
Creating tvm-11...
Starting tvm-11...
hello
real 0m33.455s
user 0m0.068s
sys 0m0.067s
free_memory : 8188
Creating tvm-12...
Starting tvm-12...
hello
real 0m33.632s
user 0m0.082s
sys 0m0.067s
4.1, maxmem=2000:
Creating tvm-0...
Starting tvm-0...
hello
real 0m20.814s
user 0m0.068s
sys 0m0.070s
free_memory : 11066
Creating tvm-1...
Starting tvm-1...
hello
real 0m20.765s
user 0m0.074s
sys 0m0.065s
free_memory : 9063
Creating tvm-2...
Starting tvm-2...
hello
real 0m21.296s
user 0m0.075s
sys 0m0.060s
free_memory : 7060
Creating tvm-3...
Starting tvm-3...
hello
real 0m21.325s
user 0m0.079s
sys 0m0.058s
free_memory : 5058
Creating tvm-4...
Starting tvm-4...
hello
real 0m22.905s
user 0m0.069s
sys 0m0.072s
free_memory : 3055
Creating tvm-5...
Starting tvm-5...
hello
real 0m25.284s
user 0m0.083s
sys 0m0.066s
free_memory : 1052
Creating tvm-6...
Starting tvm-6...
hello
real 0m25.772s
user 0m0.083s
sys 0m0.079s
free_memory : 196
Creating tvm-7...
Starting tvm-7...
hello
real 0m25.290s
user 0m0.073s
sys 0m0.068s
free_memory : 212
Creating tvm-8...
Starting tvm-8...
hello
real 0m25.617s
user 0m0.081s
sys 0m0.059s
free_memory : 228
Creating tvm-9...
Starting tvm-9...
hello
real 0m25.769s
user 0m0.073s
sys 0m0.072s
free_memory : 265
Creating tvm-10...
Starting tvm-10...
hello
real 0m26.931s
user 0m0.078s
sys 0m0.064s
free_memory : 259
Creating tvm-11...
Starting tvm-11...
hello
real 0m30.857s
user 0m0.071s
sys 0m0.069s
free_memory : 275
Creating tvm-12...
Starting tvm-12...
hello
real 0m30.719s
user 0m0.074s
sys 0m0.066s
free_memory : 291
4.0, maxmem=400:
Creating tvm-0...
Starting tvm-0...
hello
real 0m10.026s
user 0m0.077s
sys 0m0.058s
free_memory : 12660
Creating tvm-1...
Starting tvm-1...
hello
real 0m10.188s
user 0m0.072s
sys 0m0.059s
free_memory : 12254
Creating tvm-2...
Starting tvm-2...
hello
real 0m10.096s
user 0m0.070s
sys 0m0.055s
free_memory : 11856
Creating tvm-3...
Starting tvm-3...
hello
real 0m9.550s
user 0m0.074s
sys 0m0.055s
free_memory : 11459
Creating tvm-4...
Starting tvm-4...
hello
real 0m10.192s
user 0m0.074s
sys 0m0.052s
free_memory : 11062
Creating tvm-5...
Starting tvm-5...
hello
real 0m10.472s
user 0m0.067s
sys 0m0.055s
free_memory : 10663
Creating tvm-6...
Starting tvm-6...
hello
real 0m10.618s
user 0m0.077s
sys 0m0.059s
free_memory : 10266
Creating tvm-7...
Starting tvm-7...
hello
real 0m10.520s
user 0m0.068s
sys 0m0.063s
free_memory : 9869
Creating tvm-8...
Starting tvm-8...
hello
real 0m10.793s
user 0m0.070s
sys 0m0.049s
free_memory : 9472
Creating tvm-9...
Starting tvm-9...
hello
real 0m10.521s
user 0m0.074s
sys 0m0.063s
free_memory : 9075
Creating tvm-10...
Starting tvm-10...
hello
real 0m10.311s
user 0m0.069s
sys 0m0.058s
free_memory : 8676
Creating tvm-11...
Starting tvm-11...
hello
real 0m10.808s
user 0m0.071s
sys 0m0.059s
free_memory : 8280
Creating tvm-12...
Starting tvm-12...
hello
real 0m11.281s
user 0m0.064s
sys 0m0.060s
free_memory : 7883
4.0, maxmem=2000:
Creating tvm-0...
Starting tvm-0...
hello
real 0m11.550s
user 0m0.070s
sys 0m0.057s
free_memory : 11064
Creating tvm-1...
Starting tvm-1...
hello
real 0m12.858s
user 0m0.076s
sys 0m0.056s
free_memory : 9063
Creating tvm-2...
Starting tvm-2...
hello
real 0m11.649s
user 0m0.077s
sys 0m0.052s
free_memory : 7069
Creating tvm-3...
Starting tvm-3...
hello
real 0m11.646s
user 0m0.068s
sys 0m0.061s
free_memory : 5073
Creating tvm-4...
Starting tvm-4...
hello
real 0m11.609s
user 0m0.071s
sys 0m0.055s
free_memory : 3079
Creating tvm-5...
Starting tvm-5...
hello
real 0m11.459s
user 0m0.079s
sys 0m0.048s
free_memory : 1087
Creating tvm-6...
Starting tvm-6...
hello
real 0m11.967s
user 0m0.072s
sys 0m0.059s
free_memory : 242
Creating tvm-7...
Starting tvm-7...
hello
real 0m12.352s
user 0m0.074s
sys 0m0.049s
free_memory : 267
Creating tvm-8...
Starting tvm-8...
hello
real 0m12.316s
user 0m0.080s
sys 0m0.049s
free_memory : 291
Creating tvm-9...
Starting tvm-9...
hello
real 0m12.589s
user 0m0.091s
sys 0m0.064s
free_memory : 317
Creating tvm-10...
Starting tvm-10...
hello
real 0m12.423s
user 0m0.081s
sys 0m0.050s
free_memory : 342
Creating tvm-11...
Starting tvm-11...
hello
real 0m12.544s
user 0m0.080s
sys 0m0.054s
free_memory : 368
Creating tvm-12...
Starting tvm-12...
hello
real 0m12.802s
user 0m0.079s
sys 0m0.058s
free_memory : 392
I did some more testing with a small script I wrote on a fresh 4.1rc2 xfs/reflink install in comparison to an old 4.0 ext4/file install on the same machine.
Key takeaways:
qrexec
performance on 4.0 is ~2.5s on 400 MB VMs and 3-4.5s on machines with maxmem=2000. qrexec
performance on 4.1 is rather terrible in comparison: 6.5-8s on 400MB VMs and 7-10s with maxmem=2000. This should definitely be fixed, if possible.qrexec
performance also appears to slightly degrade with the number of started VMs.qrexec
performance in 4.0, but not in 4.1. It barely impacts Xen performance.The raw data:
testbug_4.0_maxmem400.txt testbug_4.0_maxmem2000.txt testbug_4.1_maxmem400_debian-11.txt testbug_4.1_maxmem400_fedora-34.txt testbug_4.1_maxmem2000_debian-11.txt testbug_4.1_maxmem2000_fedora-34.txt
qrexec
performance on 4.0 is ~2.5s on 400 MB VMs and 3-4.5s on machines with maxmem=2000.qrexec
performance on 4.1 is rather terrible in comparison: 6.5-8s on 400MB VMs and 7-10s with maxmem=2000. This should definitely be fixed, if possible.
To me it looks like in 4.1 all qvm-run
calls to a VM result in a blocking qubes.GetDate
qrexec call according to the journal. In 4.0 such a call doesn't seem to be required and when it happens, it appears to be non-blocking. This might be one of the reasons why qrexec calls after VM startup are slow.
- Overall the performance on the machine has significantly improved over my last test. Were any performance related patches merged recently?
It got worse again on more recent runs incl. 7s VM handovers and even slower qrexec, not sure why (Xen upgrade maybe?). EDIT: I believe I accidentally had a few sys- VMs running during the test. So that explains the differences.
Something else I noticed: Time spent in the VM kernel in 4.1 is ~5s contrary to 1-2s in 4.0. The reason appears to be a long delay in 4.1 before mounting the file systems.
See the below dmesg
outputs:
4.0 (nothing special):
[ 1.604135] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x256490cd5f3, max_idle_ns: 440795204886 ns
[ 1.616510] blkfront: xvdc: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 1.625344] blkfront: xvdd: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 1.725277] random: sfdisk: uninitialized urandom read (4 bytes read)
[ 1.725840] xvdc: xvdc1 xvdc3
[ 1.733342] random: mkswap: uninitialized urandom read (16 bytes read)
[ 1.789406] EXT4-fs (xvda3): mounted filesystem with ordered data mode. Opts: (null)
[ 1.795578] EXT4-fs (xvdd): mounting ext3 file system using the ext4 subsystem
[ 1.806028] EXT4-fs (xvdd): mounted filesystem with ordered data mode. Opts: (null)
[ 1.871203] EXT4-fs (xvda3): re-mounted. Opts: (null)
[ 1.962792] systemd[1]: systemd 241 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
4.1 (1.4s gap after blkfront):
[ 1.292877] Freeing unused kernel image (rodata/data gap) memory: 764K
[ 1.292905] rodata_test: all tests were successful
[ 1.292938] Run /init as init process
[ 1.292949] with arguments:
[ 1.292950] /init
[ 1.292951] rd_NO_PLYMOUTH
[ 1.292951] with environment:
[ 1.292952] HOME=/
[ 1.292953] TERM=linux
[ 1.302005] Invalid max_queues (4), will use default max: 1.
[ 1.549193] blkfront: xvda: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 1.602238] xvda: xvda1 xvda2 xvda3
[ 1.620927] blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 1.667150] blkfront: xvdc: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 1.674387] blkfront: xvdd: barrier or flush: disabled; persistent grants: enabled; indirect descriptors: enabled;
[ 3.037417] xvdc: xvdc1 xvdc3
[ 4.134550] EXT4-fs (xvda3): mounted filesystem with ordered data mode. Opts: (null)
[ 4.148753] EXT4-fs (xvdd): mounting ext3 file system using the ext4 subsystem
[ 4.150726] EXT4-fs (xvdd): mounted filesystem with ordered data mode. Opts: (null)
[ 4.288062] EXT4-fs (xvda3): re-mounted. Opts: (null)
[ 4.450156] systemd[1]: systemd 247.3-6 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
EDIT: Full 4.1 dmesg
Side Note: I also noticed many of the following entries in 4.1 dmesg that were not there in 4.0:
[ 0.500583] acpi LNXCPU:54: Still not present
Hmm, I see the magic word "random" showing up in the 4.0 log (with an uninitialized urandom read), with only an unexplained delay around the same set of actions under 4.1.
Is it possible this is, again, the kernel waiting for some time for the appropriate random pool size threshold?
B
Is it possible this is, again, the kernel waiting for some time for the appropriate random pool size threshold?
The same 4.1 dmesg had [ 6.756485] random: crng init done
much later so I guess it's not related to random not being initialised.
Anyway I updated my tool to get more precise numbers (ideally numbers not correlated to others) and received the attached results (T530, i7).
Short summary:
dmesg
).qrexec
performance on 4.1 is terrible (4-5s vs 1-2s on 4.0).qubes.getDate
is executed 1-2s earlier in 4.1, qubes.WindowIconUpdater
~4s later (not sure if that's relevant).So in total there's at least three regressions in comparison to 4.0 from my point of view (1, 2 & 4): Slower Xen, slower kernel & slower qrexec
.
I can't say whether it affects other hardware setups as well, but I guess at least the qrexec
implementation shouldn't be too hardware dependent.
testbug_4.0_maxmem400_debian-10.txt testbug_4.0_maxmem2000_debian-10.txt testbug_4.1_maxmem400_debian-11.txt testbug_4.1_maxmem2000_debian-11.txt
@3hhh agree.
Btw subsequent qrexec
calls on running VMs return within 300-500ms on both 4.1 and 4.0 even though 4.0 tends to be slightly faster on average.
I also tried
xl set-parameters ept=exec-sp
: no effectucode=scan
: 1-2s betterspec-ctrl=no
: 1-2s bettersmt=off
: no effectNone of the options brought back 4.0 performance.
So far only Sandy & Ivy bridge CPUs are affected, maybe Ryzen?
I can still try to make a raid0 on nvme and xeon.
Btw I re-ran the 4.0 vs 4.1 dmesg with the kernelopts xen_blkfront.dyndbg=+flp dynamic_debug.verbose=1
for some additional debug info.
4.0:
[ 2.048229] Invalid max_queues (4), will use default max: 2.
[ 2.051054] blkback_changed:2466: vbd vbd-51712: blkfront:blkback_changed to state 2.
[ 2.141573] blkback_changed:2466: vbd vbd-51728: blkfront:blkback_changed to state 2.
[ 2.262393] blkback_changed:2466: vbd vbd-51744: blkfront:blkback_changed to state 2.
[ 2.376785] blkback_changed:2466: vbd vbd-51760: blkfront:blkback_changed to state 2.
[ 2.449277] blkback_changed:2466: vbd vbd-51712: blkfront:blkback_changed to state 4.
[ 2.449283] blkfront_connect:2394: vbd vbd-51712: blkfront_connect:/local/domain/0/backend/vbd/15/51712.
[ 2.513042] blkfront: xvda: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 2.544954] xvda: xvda1 xvda2 xvda3
[ 2.605883] blkback_changed:2466: vbd vbd-51728: blkfront:blkback_changed to state 4.
[ 2.605886] blkfront_connect:2394: vbd vbd-51728: blkfront_connect:/local/domain/0/backend/vbd/15/51728.
[ 2.608934] blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 2.620726] blkback_changed:2466: vbd vbd-51744: blkfront:blkback_changed to state 4.
[ 2.620728] blkfront_connect:2394: vbd vbd-51744: blkfront_connect:/local/domain/0/backend/vbd/15/51744.
[ 2.623037] blkfront: xvdc: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 2.631496] blkback_changed:2466: vbd vbd-51760: blkfront:blkback_changed to state 4.
[ 2.631502] blkfront_connect:2394: vbd vbd-51760: blkfront_connect:/local/domain/0/backend/vbd/15/51760.
[ 2.633915] blkfront: xvdd: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 2.674652] blkif_ioctl:500: vbd vbd-51744: command: 0x5331, argument: 0x0
[ 2.675534] blkif_ioctl:500: vbd vbd-51744: command: 0x5331, argument: 0x0
[ 2.688424] random: sfdisk: uninitialized urandom read (4 bytes read)
[ 2.689337] xvdc: xvdc1 xvdc3
[ 2.700853] random: mkswap: uninitialized urandom read (16 bytes read)
[ 2.723988] blkif_ioctl:500: vbd vbd-51712: command: 0x5331, argument: 0x0
[ 2.732615] EXT4-fs (xvda3): mounted filesystem with ordered data mode. Opts: (null)
[ 2.738321] EXT4-fs (xvdd): mounting ext3 file system using the ext4 subsystem
[ 2.740320] EXT4-fs (xvdd): mounted filesystem with ordered data mode. Opts: (null)
[ 2.744195] dynamic_debug:ddebug_add_module: 23 debug prints in module overlay
[ 2.783727] EXT4-fs (xvda3): re-mounted. Opts: (null)
4.1:
[ 2.299636] Invalid max_queues (4), will use default max: 2.
[ 2.302851] blkback_changed:2474: vbd vbd-51712: blkfront:blkback_changed to state 2.
[ 2.389328] blkback_changed:2474: vbd vbd-51728: blkfront:blkback_changed to state 2.
[ 2.628404] blkback_changed:2474: vbd vbd-51744: blkfront:blkback_changed to state 2.
[ 2.801922] blkback_changed:2474: vbd vbd-51760: blkfront:blkback_changed to state 2.
[ 2.955371] blkback_changed:2474: vbd vbd-51712: blkfront:blkback_changed to state 4.
[ 2.955375] blkfront_connect:2401: vbd vbd-51712: blkfront_connect:/local/domain/0/backend/vbd/24/51712.
[ 2.957324] blkfront: xvda: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 2.961239] xvda: xvda1 xvda2 xvda3
[ 3.101208] blkback_changed:2474: vbd vbd-51728: blkfront:blkback_changed to state 4.
[ 3.101211] blkfront_connect:2401: vbd vbd-51728: blkfront_connect:/local/domain/0/backend/vbd/24/51728.
[ 3.103325] blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 3.128767] blkback_changed:2474: vbd vbd-51744: blkfront:blkback_changed to state 4.
[ 3.128770] blkfront_connect:2401: vbd vbd-51744: blkfront_connect:/local/domain/0/backend/vbd/24/51744.
[ 3.207748] blkfront: xvdc: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 3.215940] blkback_changed:2474: vbd vbd-51760: blkfront:blkback_changed to state 4.
[ 3.215947] blkfront_connect:2401: vbd vbd-51760: blkfront_connect:/local/domain/0/backend/vbd/24/51760.
[ 3.219995] blkfront: xvdd: barrier or flush: disabled; persistent grants: enabled; indirect descriptors: enabled;
[ 3.342172] blkif_ioctl:509: vbd vbd-51744: command: 0x5331, argument: 0x0
[ 3.342698] blkif_ioctl:509: vbd vbd-51744: command: 0x5331, argument: 0x0
[ 3.355123] blkif_ioctl:509: vbd vbd-51744: command: 0x5331, argument: 0x0
[ 4.367889] blkif_ioctl:509: vbd vbd-51744: command: 0x5331, argument: 0x0
[ 4.381678] blkif_ioctl:509: vbd vbd-51744: command: 0x5331, argument: 0x0
[ 4.640679] xvdc: xvdc1 xvdc3
[ 5.723486] blkif_ioctl:509: vbd vbd-51712: command: 0x5331, argument: 0x0
[ 5.753137] EXT4-fs (xvda3): mounted filesystem with ordered data mode. Opts: (null)
[ 5.776391] EXT4-fs (xvdd): mounting ext3 file system using the ext4 subsystem
[ 5.781462] EXT4-fs (xvdd): mounted filesystem with ordered data mode. Opts: (null)
[ 5.949733] EXT4-fs (xvda3): re-mounted. Opts: (null)
To me it looks like the Xen storage backend is slower in 4.1 (xen_blkfront
is waiting more). Also not sure why it checks the CDROM_GET_CAPABILITY
(0x5331) so often and why xvdd
has flushing disabled on 4.1, but enabled on 4.0.
Is the VM kernel version the same in both cases? The easiest way to get the same kernel version is to use kernel-latest-qubes-vm pkg.
Is the VM kernel version the same in both cases?
No, but good idea to test various kernel versions!
I did that in 4.0 and found that the last good kernel wrt this blkfront
delay is 5.10.8-1 and the first definitely bad 5.11.8-1.
The 4.1 example from above is 5.10.76-1, i.e. probably the issue is between 5.10.8-1 and 5.10.76-1 (I didn't test 4.1 so far).
5.11.4-1 on 4.0 was exposing weird behaviour in general, but not that specific regression though...
This might also slightly affect the qrexec
issue. I at least noticed a ~1s additional delay with a bad kernel there (still ~3s better than in 4.1). VM handover times in 4.0 however remain at ~2s less than the 4.1 times even with those kernels.
I tested a few more kernels on 4.0 and found that 5.10.16-1 is good as well. Double checking on 5.11.4-1 revealed that it is likely the last good kernel wrt this blkfront
delay (5.4.156-1 dom0 kernel, Xen 4.8.5).
On 4.1 (5.10.76-1 dom0 kernel, Xen 4.14.3) surprisingly I couldn't find a single "good" VM kernel wrt this blkfront
delay, not even 5.4.55-1. Actually newer kernels even made things more slow...
I failed to get in-VM kernels going.
Another comparison from /var/log/xen/
which appears to have a bit more information:
4.0:
[ 0.000000] Linux version 5.10.16-1.fc25.qubes.x86_64 (user@build-fedora4) (gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), GNU ld version 2.26.1-1.fc25) #1
[ 0.000000] Command line: root=/dev/mapper/dmroot ro nomodeset console=hvc0 rd_NO_PLYMOUTH rd.plymouth.enable=0 plymouth.enable=0 xen_scrub_pages=0 xen_blkfront.dyndbg=+flp dynamic_debug.verbose=1
...
[ 0.875938] Invalid max_queues (4), will use default max: 2.
[ 1.274497] blkfront: xvda: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 1.293832] xvda: xvda1 xvda2 xvda3
[ 1.344247] blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 1.404975] blkfront: xvdc: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[ 1.411100] blkfront: xvdd: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
Waiting for /dev/xvda* devices...
Qubes: Doing R/W setup for TemplateVM...
[ 1.477060] random: sfdisk: uninitialized urandom read (4 bytes read)
[ 1.477566] xvdc: xvdc1 xvdc3
[ 1.486487] random: mkswap: uninitialized urandom read (16 bytes read)
Setting up swapspace version 1, size = 1024 MiB (1073737728 bytes)
no label, UUID=b2151064-3fe6-49f3-bf59-b9dcf4a28c81
Qubes: done.
[ 1.526014] EXT4-fs (xvda3): mounted filesystem with ordered data mode. Opts: (null)
Waiting for /dev/xvdd device...
mount: /dev/xvdd is write-protected, mounting read-only
[ 1.533772] EXT4-fs (xvdd): mounting ext3 file system using the ext4 subsystem
[ 1.542292] EXT4-fs (xvdd): mounted filesystem with ordered data mode. Opts: (null)
[ 1.594591] EXT4-fs (xvda3): re-mounted. Opts: (null)
switch_root: failed to mount moving /dev to /sysroot/dev: Invalid argument
switch_root: forcing unmount of /dev
switch_root: failed to mount moving /proc to /sysroot/proc: Invalid argument
switch_root: forcing unmount of /proc
switch_root: failed to mount moving /sys to /sysroot/sys: Invalid argument
switch_root: forcing unmount of /sys
switch_root: failed to mount moving /run to /sysroot/run: Invalid argument
switch_root: forcing unmount of /run
[ 1.688390] systemd[1]: systemd 241 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
4.1:
[2022-01-17 23:20:46] [ 0.000000] Linux version 5.10.76-1.fc32.qubes.x86_64 (mockbuild@build-fedora4) (gcc (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1), GNU ld version 2.34-6.fc32) #1 SMP Fri Oct 29 03:58:58 CEST 2021
[2022-01-17 23:20:46] [ 0.000000] Command line: root=/dev/mapper/dmroot ro nomodeset console=hvc0 rd_NO_PLYMOUTH rd.plymouth.enable=0 plymouth.enable=0 xen_scrub_pages=0 nopat
...
[2022-01-17 23:23:22] [ 2.656119] Run /init as init process
[2022-01-17 23:23:22] Qubes initramfs script here:
[2022-01-17 23:23:22] [ 2.669616] Invalid max_queues (4), will use default max: 2.
[2022-01-17 23:23:23] [ 3.079762] blkfront: xvda: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[2022-01-17 23:23:23] [ 3.107136] xvda: xvda1 xvda2 xvda3
[2022-01-17 23:23:23] [ 3.200808] blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[2022-01-17 23:23:23] [ 3.266375] blkfront: xvdc: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[2022-01-17 23:23:23] [ 3.279484] blkfront: xvdd: barrier or flush: disabled; persistent grants: enabled; indirect descriptors: enabled;
[2022-01-17 23:23:23] Waiting for /dev/xvda* devices...
[2022-01-17 23:23:23] Qubes: Doing R/W setup for TemplateVM...
[2022-01-17 23:23:24] [ 4.651077] xvdc: xvdc1 xvdc3
[2022-01-17 23:23:25] Setting up swapspace version 1, size = 1024 MiB (1073737728 bytes)
[2022-01-17 23:23:25] no label, UUID=06b63771-96e5-4b48-8777-c99380528e34
[2022-01-17 23:23:25] Qubes: done.
[2022-01-17 23:23:25] [ 5.774377] EXT4-fs (xvda3): mounted filesystem with ordered data mode. Opts: (null)
[2022-01-17 23:23:25] Waiting for /dev/xvdd device...
[2022-01-17 23:23:25] [ 5.797393] EXT4-fs (xvdd): mounting ext3 file system using the ext4 subsystem
[2022-01-17 23:23:25] [ 5.802029] EXT4-fs (xvdd): mounted filesystem with ordered data mode. Opts: (null)
[2022-01-17 23:23:25] mount: /tmp/modules: WARNING: device write-protected, mounted read-only.
[2022-01-17 23:23:26] [ 5.984659] EXT4-fs (xvda3): re-mounted. Opts: (null)
[2022-01-17 23:23:26] switch_root: failed to mount moving /dev to /sysroot/dev: Invalid argument
[2022-01-17 23:23:26] switch_root: forcing unmount of /dev
[2022-01-17 23:23:26] switch_root: failed to mount moving /proc to /sysroot/proc: Invalid argument
[2022-01-17 23:23:26] switch_root: forcing unmount of /proc
[2022-01-17 23:23:26] switch_root: failed to mount moving /sys to /sysroot/sys: Invalid argument
[2022-01-17 23:23:26] switch_root: forcing unmount of /sys
[2022-01-17 23:23:26] switch_root: failed to mount moving /run to /sysroot/run: Invalid argument
[2022-01-17 23:23:26] switch_root: forcing unmount of /run
[2022-01-17 23:23:26] libbpf: failed to find valid kernel BTF
[2022-01-17 23:23:26] libbpf: Error loading vmlinux BTF: -3
[2022-01-17 23:23:26] libbpf: failed to load object 'iterators_bpf'
[2022-01-17 23:23:26] libbpf: failed to load BPF skeleton 'iterators_bpf': -3
[2022-01-17 23:23:26] Failed load could be due to wrong endianness
[2022-01-17 23:23:26] [ 6.297525] systemd[1]: systemd 247.3-6 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
[2022-01-17 23:23:26] [ 6.297733] systemd[1]: Detected virtualization xen.
[2022-01-17 23:23:26] [ 6.297758] systemd[1]: Detected architecture x86-64.
Btw this xen:balloon
in 4.0 and 4.1 sometimes eats an additional ~1.3s of kernel boot time (oddly sometimes only 0.1s). Is this really necessary and if so: Why does the boot process have to wait for it? Why run it for fixed memory VMs?
Btw this
xen:balloon
in 4.0 and 4.1 sometimes eats an additional ~1.3s of kernel boot time (oddly sometimes only 0.1s). Is this really necessary and if so: Why does the boot process have to wait for it? Why run it for fixed memory VMs?
For fixed memory VMs it should be no-op. If it takes some time, most likely it is not a fixed memory VM. And yes, it is necessary, otherwise it risks crashing the VM with rather obscure effects.
Thanks for the clarification!
How can one debug Xen for the Xen VM handover time issue? I.e. get some more debug output with timestamps before Xen actually activates a VM?
How can one debug Xen for the Xen VM handover time issue? I.e. get some more debug output with timestamps before Xen actually activates a VM?
You can do that by setting log_level=1
in /etc/libvirt/libvirtd.conf
(and restarting libvirtd service). It will produce a lot of output into system journal, but you are interested in something else: /var/log/libvirt/libxl/libxl-driver.log
. After setting debug log level, it will contain a lot more info on VM startup, with timestamps.
you are interested in something else:
/var/log/libvirt/libxl/libxl-driver.log
Now that was helpful as it shows
Starting vm.[vm-name]
log entry is written at a different point in time in 4.0). Not sure why it's so blazingly fast.So in total the ~3s regression in VM handover times with 4.1 results from me migrating from the file pool driver in 4.0 to file-reflink in 4.1.
@rustybird Would it be possible to prepare these snapshots or whatever file-reflink is doing before passing over to Xen at another point in time to speed up the VM boot process? E.g. right after the last VM exit?
@marmarek Adding a log entry such as vm.[vm name]: Storage initialized.
before Xen handover would be useful to more easily identify such issues.
Now only the qrexec time regression remains to be identified, but I guess it's either related to the previously analyzed VM kernel issue or the new 4.1 qrexec socket setup code.
EDIT: For explanation: The "Xen handover time" measured by my tool = storage pool setup time / whatever Qubes does + Xen init time.
3. file in 4.0 appears to hand over to Xen almost instantaneously (unlikely: or the
Starting vm.[vm-name]
log entry is written at a different point in time in 4.0). Not sure why it's so blazingly fast.
There was a major change how "file" storage works between R4.0 and R4.1 - in R4.0 it is initialized later (called from libxl when creating blkback), while in R4.1 it's initialized before telling libxl anything.
- file in 4.0 appears to hand over to Xen almost instantaneously (unlikely: or the
Starting vm.[vm-name]
log entry is written at a different point in time in 4.0). Not sure why it's so blazingly fast.There was a major change how "file" storage works between R4.0 and R4.1 - in R4.0 it is initialized later (called from libxl when creating blkback), while in R4.1 it's initialized before telling libxl anything.
Does that mean that QubesOS/qubes-core-admin#399 was the cause of the regression?
@3hhh
Would it be possible to prepare these snapshots or whatever file-reflink is doing before passing over to Xen at another point in time to speed up the VM boot process? E.g. right after the last VM exit?
Interesting idea. At least for save_on_stop volumes (e.g. the AppVM private volume) yes the reflink cost could be shifted to the volume stop phase by creating a private-precache.img
, which would then be renamed to private-dirty.img
on volume start. Seems like mostly a matter of getting cache invalidation right for revision handling, importing, etc.
For snap_on_start volumes (e.g. the AppVM root volume) it's not feasible EDIT: but there's an opportunity to replace two reflink operations with just one.
@DemiMarie
- file in 4.0 appears to hand over to Xen almost instantaneously (unlikely: or the
Starting vm.[vm-name]
log entry is written at a different point in time in 4.0). Not sure why it's so blazingly fast.There was a major change how "file" storage works between R4.0 and R4.1 - in R4.0 it is initialized later (called from libxl when creating blkback), while in R4.1 it's initialized before telling libxl anything.
Does that mean that QubesOS/qubes-core-admin#399 was the cause of the regression?
No just a change in accounting. Before your PR all the dmsetup
invocations etc. were done implicitly after handover to Xen.
5. file in 4.0 appears to hand over to Xen almost instantaneously (unlikely: or the
Starting vm.[vm-name]
log entry is written at a different point in time in 4.0). Not sure why it's so blazingly fast.There was a major change how "file" storage works between R4.0 and R4.1 - in R4.0 it is initialized later (called from libxl when creating blkback), while in R4.1 it's initialized before telling libxl anything.
Does that mean that QubesOS/qubes-core-admin#399 was the cause of the regression?
No just a change in accounting. Before your PR all the
dmsetup
invocations etc. were done implicitly after handover to Xen.
I'd rather suspect that letting libxl know about storage init allows it to run storage init and Xen init code in parallel.
So to me it sounds more like 4.0 file pool users were the rare birds that had unexpectedly good performance due to the late block script init whereas everyone else was already synchronuously initialized and QubesOS/qubes-core-admin#399 finally grounded those rare birds in 4.1. ^^ At least that explains why my attempts in 4.1 with file pools didn't provide any performance gain.
Argh... At least the blkfront
wait should be unrelated as it also happens in 4.0 after kernel 5.11.4-1 with file pools.
Maybe I'll look into the qrexec
issue next...
How to file a helpful issue
Qubes OS release
4.1rc2
Brief summary
Since switching from 4.0 to 4.1rc2 I'm experiencing a significant slowdown of all VM start and shutdown operations.
Shutdowns (especially of dispVMs) may cause the system to freeze entirely for 1-2s even though the CPU load doesn't seem high (I/O only?).
The system is identical, except the hard disk: btrfs/reflink is installed on a newer Sandisk Ultra 3D SSD (ext4/file on some old Lenovo standard SSD).
Steps to reproduce
qvm-shutdown --all --wait sudo swapoff -a swap="$(sudo mount|grep swap)" [ -z "$swap" ] || { echo "Please disable swap first." ; exit 2 ; }
for ((i=0;i<13;i++)) ; do vm="tvm-$i" qvm-remove -f "$vm" done
for ((i=0;i<13;i++)) ; do vm="tvm-$i" echo "Creating $vm..." qvm-create -l red "$vm" qvm-prefs "$vm" netvm "" qvm-prefs "$vm" memory 400 qvm-prefs "$vm" maxmem 0
echo "Starting $vm..."
beg=$SECONDS
sudo perf record -a --call-graph fp -o "${vm}_perf.data" qvm-run -p "$vm" 'echo hello'
echo "duration: $(($SECONDS - $beg))"
time qvm-run -p "$vm" 'echo hello' xl info | grep free_memory done
qvm-shutdown --all --wait
for ((i=0;i<13;i++)) ; do vm="tvm-$i" qvm-remove -f "$vm" done