Closed dylangerdaly closed 1 month ago
I'm doing the same, one other thing I had enabled is AMD's TearFree option in X11, I've removed that config file, I'll see if that makes any difference
Hmmm... Those temps were taken from sensors
"temp1". But there is a "temp11" that always shows 66C (under any idle/load) when plugged into AC and 0C when on battery. I don't know if that could be affecting your temp readouts. The rest of the internal temps always show as 0C.
I'm using the Sensors plugin for xfce, Sensor Type is k10temp-c3 and the sensor name is Tctl, there's also Tdie, iirc Tctl is the temp of the package itself? and Tdie is the die temp?
Yes, the temp1 line appears to be an int value of Tdie/Tclt.
Another BTW, since the lurching issue is power-management related I think this applies:
Marek's specially modified Xen in qubes*unstable doesn't always have the correct configuration. For example with 4.14.0-6.2 the lurching came back... I had to downgrade xen packages to 4.14.0-6.1 to get performance back to normal.
Oh wow, okay.
So I tried reverting Marek's revert 2 weeks ago, but I must have accidentally reverted the wrong commit ID
commit bab37273543da7df4148773b96e677913dc52cd7 (HEAD -> xen-4.14)
Author: Dylanger Daly <dylanger@diagnostix.io>
Date: Tue Dec 8 15:27:15 2020 +1000
Revert "Fix S3 resume"
This reverts commit c28754bdb458281a22e9a9779213c941531b6dff.
Reverting c28754bdb458281a22e9a9779213c941531b6dff
that commit specifically results in a much smoother experience.
Then forcing the built package over the existing one
sudo rpm -ihv --force xen-hypervisor-4.14.0-8.fc32.x86_64.rpm
1080P YouTube playback still isn't 100% 'lurch' free, Tasket can you confirm this? There are still teeny-tiny little hiccups when playing 1080p videos.
@tasket, can I get what your CMDLINE is for Kernel and Xen? I'm still idling around 75C when there's nothing going on, have you changed any UEFI settings as well?
Another thing, what version of UEFI are you on?
I've updated to 1.27 (r1cuj58wd) https://download.lenovo.com/pccbbs/mobiles/r1cuj58wd.txt https://download.lenovo.com/pccbbs/mobiles/r1cuj58wd.iso
7bfdda966c172f1fdb0e27123b25e651bdb6f27529399ff96c858471612f2337 r1cuj58wd.iso
But it appears that's been removed from the page? Lenovo's Support page only shows 1.25 as the latest version.
I've possibly installed a version of UEFI that's overheating when on AC, so Lenovo pulled it?
Can someone confirm this?
Keeping in mind this is the T14 not X13, firmware is v1.05 from June 11.
FWIW, the T14 download page has 1.27 available: https://pcsupport.lenovo.com/us/en/products/laptops-and-netbooks/thinkpad-t-series-laptops/thinkpad-t14-type-20ud-20ue/downloads/ds544977-bios-update-utility-bootable-cd-for-windows-10-64-bit-thinkpad-t14-gen-1-types-20ud-20ue
Yes, there is still a little judder when HD video playback is fullscreen or nearly fullscreen. Using VLC, it becomes noticeable when the viewport is about 2/3 of full height.
The video RAM setting I mentioned earlier is one of the few I remember changing; others were security features.
From grub.cfg:
multiboot2 /xen-4.14.0.gz placeholder console=none dom0_max_vcpus=1 dom0_vcpus_pin=1 dom0_mem=min:1024M dom0_mem=max:2048M ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096 ept=exec-sp ${xen_rm_opts}
echo 'Loading Linux 5.8.18-200.fc32.x86_64 ...'
module2 /vmlinuz-5.8.18-200.fc32.x86_64 placeholder root=/dev/mapper/qubes_dom0-root ro rd.luks.uuid=luks-ebc32f3e-6002-4c17-9759-db70e0f6c859 rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles rd.driver.pre=btrfs rhgb quiet amdgpu.noretry=0 amdgpu.gpu_recovery=1 iommu=pt amd_iommu=fullflush amdgpu.dc=1 rhgb rcu_nocbs=0 rd.qubes.hide_all_usb=0
echo 'Loading initial ramdisk ...'
module2 --nounzip /initramfs-5.8.18-200.fc32.x86_64.img
I managed to downgrade to 1.25, I tried booting Fedora 32 Workstation and it's at 45C on AC, so there's something wrong with Xen's Power Management I think
I was missing ept=exec-sp
from Xen's CMDLINE, just re-added that and it's really smooth now
tasket's command lines are basically the same as mine, and setting VMs to only using 1 or 2 cores seems okay for now. I don't know if lscpu or /proc/cpuinfo is accurate in this scenario, but under no circumstances does the CPU go above 2.1GHz, is it potentially not boosting?
Also, I am running default, sys-net/sys-firewall start-up on boot etc. and every single time I boot up my laptop in the morning, no other VMs will start after that. Only after I restart it, will other VMs e.g. "personal" start up. This is consistent, and I previously thought it was maybe SMT, but I've turned it off and still it behaves like that.
@crat0z can you confirm what Lenovo model you're on?
If you have an X13, can you confirm temperature increases (+10-15C) when the device is on AC?
I'm not sure why Xen still hasn't identified these issues, I think there are no server counterparts to Ryzen 4000? Also it's super hard to actually debug x86, compared to ARM/Qualcomm SoCs.
I have a T14, and AC does not affect my temperatures, so I can't help with that unfortunately.
AFAIK, Renoir CPUs are Zen 2, which on desktop are Ryzen 3000 series CPUs, and in workstations/servers with codename "Rome", and have been available since mid 2019. It seems the server CPUs have been supported since Xen 4.13.
Of course, just because that's the case doesn't mean everything should be okay for us using laptops. There are probably quirks to the Renoir CPUs, but given new desktop CPUs just came out last month and server CPUs possibly coming out this month, perhaps "general support for AMD" will be much higher soon.
I'm not 💯, but I think the reason this isn't working is because Xen expects SMT is enabled, I read that AMD rely on their implementation of SMT for scheduling/timing.
I've also read that Xen have largely "fixed" the security issues relating to SMT/HT, is it possible to test SMT?
Switching smt=on sched-gran=core
results in a black screen, maybe there's a commit I'm missing?
(I'm on 4.14.0-9)
This would explain why EPYC CPU using the same underlying cores that Ryzen 4000 use are working just fine with Xen (SMT Enabled)
I'm happy to have SMT enable knowing the scheduling is more secure. -------- Original Message -------- On 13 Dec 2020, 5:57 pm, crat0z wrote:
I have a T14, and AC does not affect my temperatures, so I can't help with that unfortunately.
AFAIK, Renoir CPUs are Zen 2, which on desktop are Ryzen 3000 series CPUs, and in workstations/servers with codename "Rome", and have been available since mid 2019. It seems the server CPUs have been supported since Xen 4.13.
Of course, just because that's the case doesn't mean everything should be okay for us using laptops. There are probably quirks to the Renoir CPUs, but given new desktop CPUs just came out last month and server CPUs possibly coming out this month, perhaps "general support for AMD" will be much higher soon.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
As far as I know, in regards to SMT, the only known CPU vulnerability AMD CPUs have been vulnerable to has been Spectre which seems to have been taken care of with Zen 2 microarchitecture. At least for my use case, I'm fine with SMT enabled, but others might not be.
If I remember correctly, sched-gran=core command line option is still somewhat buggy and an experimental feature of Xen. I do not know what the current situation or future plans are for it, I haven't kept up for a while since I ditched my old laptop. It might have been canned, since even if core scheduling is used on a vulnerable CPU, it still allows information leaks within the compromised VM.
Unfortunately, as of now, enabling SMT does not remedy the required dom0_max_vcpus=1 dom0_vcpus_pin=1
command line options for a smooth experience. Whether SMT is enabled or not, without those command line options, Qubes is a laggy mess in the VMs and in dom0.
I'm noticing massive performance gains on Fedora 33 TemplateVM.
Browser is significantly smoother. @tasket update your TemplateVM and check it out
I've been trying to troubleshoot these multicore/credit2 issues, debugging Xen is really difficult without a serial output, basically no laptops have a serial output at all, for now max 2 vcpus per appVM will be around for a while as it's really hard to troubleshoot.
I wonder if the desktop Renoir APUs e.g. 4750G have similar problems. Surely there is an AM4 motherboard that allows serial debugging.
I think they do, someone on VFIO Discord was experiencing the exact same issues on a Desktop Ryzen 3000 CPU.
it won't really matter to gpu as such, but do you have Xorg-x11-drv-amdgpu installed? That aside, I'd suggest upping the max_vcpus to 3 or 4.
I'm testing this now on my Zephyrus G14. Sorry for noob question but how do I install Xorg-x11-drv-amdgpu?
run qubes-dom0-update xorg-x11-drv-amdgpu in dom0
run qubes-dom0-update xorg-x11-drv-amdgpu in dom0
I had an issue with the HDMI port not working, this didn't fix it though. Might start a separate thread about that.
Qubes is running smoothly though and no performance issues yet. (on Ryzen 4900HS)
@MaximumViciousDeer Can you try running a CPU benchmark with varying amount of cores in a VM? I used y-cruncher, and I'm on a 6 core 4650U, with SMT enabled in Xen. My results showed that setting 1 core, 2 core, 4 core, and 6 cores to the VM seemed to scale properly. beyond that, the 8 core and 12 core tests not only took longer to run y-cruncher, but there was very obvious input delay/lag/stuttering even just typing in the terminal. I can't post the results as I'm not on my laptop at the moment, but it was something like 6 cores did ~28 seconds and 8/12 cores did 33/35 seconds respectively. For reference, Ubuntu 20.10 live USB did 12 thread y-cruncher in about 25 seconds, with an 5.8 kernel. sched=credit
does seem to have better performance by the way.
Geekbench may be a better utility to use, but it doesn't work with Fedora and I've been too lazy to try it in a debian VM/build Ubuntu template.
Do the higher cpu count ryzen 4000 cpus have something like a NUMA arrangement?
From what I remember with the Ubuntu live USB, I was checking lscpu
to confirm there were 12 threads recognized, and it said there was only 1 NUMA node. More NUMA nodes might be used on their higher end desktop processors with more than 8 cores
Ah, I missed that this was a 6-core cpu.
I thought the guidance for Qubes was not to assign more vcpus to a domU than real cores?
I just tested y-cruncher and it scales properly up to 8 cores for the standard 500m computation. On 8 cores it took ~85 seconds, on 6 it took ~100 etc. There's only one NUMA node. My lscpu shows 8 cores and one thread per core.
@brendanhoar I'm not sure, at least I never read that. I've ran Qubes on a few different machines, from old vulnerable Intel CPUs (i7-2620m, i7-5600u) and a more modern AMD r5 2600. with SMT enabled in Xen, they've all performed faster. Also, with SMT enabled utilities like xentop say Xen has more vCPUs to work with anyway, so it makes sense that SMT enabled is faster.
@MaximumViciousDeer lscpu is reporting 8 cores and 1 thread per core because Qubes ships with smt=off
xen command line option by default. On boot up, you can press e and edit the xen command line to smt=on
and lscpu will report 2 threads per core. Permanent changes are done in /etc/default/grub
and you'll have to run grub-mkconfig
in dom0 after.
Unfortunately though, I suspect Xen + SMT is broken with these CPUs. Given your 8 core's y-cruncher times got faster and faster up to 8 cores, mine does the same with 6 cores, I suspect yours will get worse if you enable SMT and try e.g. 16 vCPU y-cruncher.
Also, dom0_max_vcpus=1 dom0_vcpus_pin
are the only good options definitely. I tried SMT disabled, tried max_vcpus=2,4,12, I tried not pinning, I tried between sched=credit
and default, nothing is as good as the first one. It is immediately painfully obvious after entering LUKS password, as the time to start up is 3-10 times longer.
Ah, nevermind. SMT does work and does have tangible differences. It seems y-cruncher might be a bad benchmark. 7z's benchmark showed SMT enabled 12 vCPU VM performs the fastest.
Power plugged in, SMT enabled in BIOS, no VMs open besides benchmark vm, sched=credit
Example output of 7z b:
Compressing | Decompressing
Dict Speed Usage R/U Rating | Speed Usage R/U Rating
KiB/s % MIPS MIPS | KiB/s % MIPS MIPS
22: 23474 545 4191 22836 | 260193 598 3713 22189
23: 21214 573 3774 21615 | 226772 596 3291 19622
24: 19407 562 3715 20867 | 222532 598 3266 19532
25: 17078 551 3538 19499 | 215440 596 3218 19173
---------------------------------- | ------------------------------
Avr: 558 3804 21204 | 597 3372 20129
Tot: 577 3588 20667
RESULTS
smt=on
6 vCPU:
Avr: 558 3804 21204 | 597 3372 20129
Tot: 577 3588 20667
12 vCPU:
Avr: 1015 2429 24630 | 1182 2556 30190
Tot: 1098 2492 27410
smt=off
6 vCPU:
Avr: 514 4208 21591 | 585 3450 20194
Tot: 550 3829 20892
12 vCPU:
Avr: 489 4436 21592 | 565 3538 19958
Tot: 527 3987 20775
Yet, the "fastest' VM configuration still performs much worse from a user perspective. Funnily enough, while I was doing the smt=off tests, I noticed that setting the vCPU count to 5 stopped the input lag. Could it be Xen's scheduler causing this? The only other VM turned on is dom0, and in this case it has 1 vCPU and it's pinned.
Qubes ships with
smt=off
Isn't that a security feature to mitigate Spectre/Meltdown?
yeah, but it's not proven to be necessary for ryzen, esp. zen2 which has additional mitigations. Just intel that's fucked and getting worse every month.
IIRC while googling for this issue a few weeks ago I found someone (maybe the OP?) suggesting elsewhere that credit2 has trouble with zen2 consumer parts with disabled SMT?
yeah, but it's not proven to be necessary for ryzen, esp. zen2 which has additional mitigations. Just intel that's fucked and getting worse every month.
Well, AMD has less of those bugs, but it doesn't mean it's completely unaffected. Some high level summary: https://en.wikipedia.org/wiki/Transient_execution_CPU_vulnerability - as you can see, zen2 is affected by some of them too. For specific platform I recommend checking with https://github.com/speed47/spectre-meltdown-checker
I've been doing some IO testing with kdiskmark, and unsurprisingly the default Qubes configuration vastly outperforms the command line fix configuration. On my Sabrent Rocket Q SSD, sequential reads drop from 2000MB/s to about 700MB/s, random speeds are more or less the same. That's still faster than SATA 3, so it's certainly not unusable, but Qubes startup and VM startup especially is not as fast as it can be. On the topic of VM startups, the more vCPUs assigned to a VM, the longer startup takes... same thing for dom0. What a mystery.
Also noteworthy is that when the Xen command line options "fix" isn't applied, the stuttering/lag still occurs even when no VMs are running.
At this point I'm not really certain where else to look for diagnosing the issue(s), besides of course actually debugging Xen or Linux potentially? For anyone who is finding this now, here are some tips for a usable experience:
Post install of R4.1,
/etc/yum.repos.d/qubes-dom0.repo
and install kernel-latest kernel-latest-qubes-vm
. This is probably mandatory to even get Qubes installed..xorg-x11-drv-amdgpu
if you have an AMD APU/GPU, not necessary though if you aren't using AMD graphics.For Xen command line options,
dom0_max_vcpus=1 dom0_vcpus_pin=1
is pretty much mandatorysched=credit
might help, might not. Worth trying if you're having issuessmt=on
, or commenting out the bottom line in /etc/default/grub
helps A LOT if you run many VMs. Do note the potential security risk in this, as discussed above. As for setting up VMs to optimize performance and minimize stuttering, it all comes down to assessing each VM's vCPU and RAM requirements. Some general tips,
Thank you for testing and bench-marking further.
Yeah it seems the issue is mainly due to Xen's Scheduler, it's possible to debug the scheduler however some form of Serial is required, on Servers this is a non-issue, Mobile platforms however...
It's super hard to debug this unless you're an AMD Engineer with a Serial port attached to your laptop, I think it'll be something simple like a timing config that's a little different on Ryzen consumer parts
You can follow the mailing list here
Serial is required
FWIW, as long as you have functioning dom0, all those debug handlers can be called via xl debug-key <letter>
from dom0, and then output collected via xl dmesg
or in /var/log/xen/console/hypervisor.log
.
One thing to keep in mind is that the debug conring for Xen is tiny. It can be increased with a boot option.
see: https://github.com/QubesOS/qubes-issues/issues/5674#issuecomment-648519412
Is there any sort of standardized testing suite that could be used compare Qubes systems? I'm thinking single/multithreaded VM performance, various IO tests.
I have several Qubes systems now (now including a Ryzen 7 4750U) and its difficult to gauge whether one system or configuration is "faster" than another, other than anecdotally, especially across multiple metrics.
On Mon, Feb 08, 2021 at 02:48:26PM -0800, n0madK wrote:
Is there any sort of standardized testing suite that could be used compare Qubes systems? I'm thinking single/multithreaded VM performance, various IO tests.
I have several Qubes systems now (now including a Ryzen 7 4750U) and its difficult to gauge whether one system or configuration is "faster" than another, other than anecdotally, especially across multiple metrics.
This has just come up on the user Forum - there isn't (as yet) a test suite, but discussions are under way.
Users that have the exact same CPU are reporting they don't need to pin/limit dom0 to 1 vCPU at all.
This includes domU's running > 2 vCPUs. No stuttering. No lag.
Device reported working is the HP EliteBook 845 G7.
This indicates there's something Lenovo specific that's requiring us to nerf performance.
Lenovo. Ugh, I'm guessing it's either UEFI or ACPI
I bring the gift of decent Xen performance on Lenovo X13/T14 devices!
So! It turns out Lenovo have absolutely trashed HPET/Clock, I believe this is what's causing the lags/jitters.
Appending clocksource=tsc tsc=unstable hpetbroadcast=0
to Xen's CMDLINE fixes the need for pinning 1 core to dom0, it also fixes the need for limiting 2 vCPUs per appVM.
It should force the use of the TSC for clock, set TSC 'unstable' and totally disable HPET.
I'm running 6 vCPUs within an appVM like butter! :butter: :butter: :butter:
Closing this issue! :tada:
This is solved in xen 4.15, no need for kernel parameters.
Another quick little update, Lenovo have graced us with an update to their UEFI firmware, the changelog is here
The point
Totally fixes the issues I was having in Qubes, after you update, it's possible to revert the Xen cmdline referenced here
Very nice! Might be a little off-topic, but have you experienced issues where the laptop just completely shuts down? It seems to happen when I have a bunch of Firefox tabs open. It's quite random though, doesn't always happen. Usually the VM I'm using becomes unresponsive, presumably crashing Xen after a few seconds.
I have had this happen, though it's usually when it's plugged in on my bed, try use RyzenAdj with the --power-saving
, that'll force the CPU to think it's unplugged and keep thermals under control/not totally crash.
This will be my last Lenovo device, as Lenovo has crossed into Huawei/untrusted territory.
I think HP will be my next device.
Sorry for the thread necromancy and redirection, but I'm experiencing some high CPU thermals on my Ryzen 7 4800H CPU when it should be idle. I've tried most of the suggested solutions in this thread with no success, can anyone point me to a way I can diagnose or fix the issue? :pray: I've described the details in https://github.com/QubesOS/qubes-issues/issues/6647
Reopening as Qubes OS should work out of the box.
I had similar issues with a new Thinkpad L14 Gen3 AMD with a Ryzen 5875U.
clocksource=tsc tsc=unstable hpetbroadcast=0
has fixed it for now.
Another quick little update, Lenovo have graced us with an update to their UEFI firmware, the changelog is here
The point
* Fixed an issue that Fixed TSC synchronization failed under linux.
Totally fixes the issues I was having in Qubes, after you update, it's possible to revert the Xen cmdline referenced here
Unfortunately it seems newer models are still affected by this UEFI issue. I'm already on the latest version from my Gen3 L14, but there is no mention of a TSC fix in the UEFI firmware changelog and I still need to use clocksource=tsc tsc=unstable hpetbroadcast=0
.
I had similar issues with a new Thinkpad L14 Gen3 AMD with a Ryzen 5875U.
clocksource=tsc tsc=unstable hpetbroadcast=0
has fixed it for now.Another quick little update, Lenovo have graced us with an update to their UEFI firmware, the changelog is here The point
* Fixed an issue that Fixed TSC synchronization failed under linux.
Totally fixes the issues I was having in Qubes, after you update, it's possible to revert the Xen cmdline referenced here
Unfortunately it seems newer models are still affected by this UEFI issue. I'm already on the latest version from my Gen3 L14, but there is no mention of a TSC fix in the UEFI firmware changelog and I still need to use
clocksource=tsc tsc=unstable hpetbroadcast=0
.
This was fixed in in the 1.32 BIOS/UEFI update for L14 Gen3 AMD. Though it seems the newer Xen version in 4.2 already fixed this, as I already had no need for the kernel parameters with 4.2 with the older BIOS/EUFI. With 4.1 now EOL, this issue can be closed I think.
Qubes OS version 4.1
Affected component(s) or functionality Entire OS/Experience
Brief summary There appears to be something wrong with the CPU, every 3-5 seconds everything will lockup, here's a GIF for visuals.
I've confirmed this is specific to AMD 4000 CPUs because 4.1 running on a i7-1065G7 works fine (Still at a much slower rate than 4.0.3 but that's beside the point)
To Reproduce
Steps to reproduce the behavior:
Expected behavior Smooth as butter 8 Core experience
Actual behavior Terrible lockups every 3-5 seconds with full hangs peppered in randomly
Screenshots See GIF in Brief
Additional context NIL
Solutions you've tried Not sure how/where to troubleshoot this, I assume it has something to do with Xen.