QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
541 stars 48 forks source link

Ryzen 4000 CPU Performance Issues (Lockups) #6055

Closed dylangerdaly closed 1 month ago

dylangerdaly commented 4 years ago

Qubes OS version 4.1

Affected component(s) or functionality Entire OS/Experience

Brief summary There appears to be something wrong with the CPU, every 3-5 seconds everything will lockup, here's a GIF for visuals.

I've confirmed this is specific to AMD 4000 CPUs because 4.1 running on a i7-1065G7 works fine (Still at a much slower rate than 4.0.3 but that's beside the point)

To Reproduce

Steps to reproduce the behavior:

  1. Install Qubes 4.1 onto a Lenovo X13 or any AMD Ryzen 4000 Laptop
  2. Observe weird lockups and full hangs

Expected behavior Smooth as butter 8 Core experience

Actual behavior Terrible lockups every 3-5 seconds with full hangs peppered in randomly

Screenshots See GIF in Brief

Additional context NIL

Solutions you've tried Not sure how/where to troubleshoot this, I assume it has something to do with Xen.

dylangerdaly commented 3 years ago

I'm doing the same, one other thing I had enabled is AMD's TearFree option in X11, I've removed that config file, I'll see if that makes any difference

tasket commented 3 years ago

Hmmm... Those temps were taken from sensors "temp1". But there is a "temp11" that always shows 66C (under any idle/load) when plugged into AC and 0C when on battery. I don't know if that could be affecting your temp readouts. The rest of the internal temps always show as 0C.

dylangerdaly commented 3 years ago

I'm using the Sensors plugin for xfce, Sensor Type is k10temp-c3 and the sensor name is Tctl, there's also Tdie, iirc Tctl is the temp of the package itself? and Tdie is the die temp?

tasket commented 3 years ago

Yes, the temp1 line appears to be an int value of Tdie/Tclt.

Another BTW, since the lurching issue is power-management related I think this applies:

Marek's specially modified Xen in qubes*unstable doesn't always have the correct configuration. For example with 4.14.0-6.2 the lurching came back... I had to downgrade xen packages to 4.14.0-6.1 to get performance back to normal.

dylangerdaly commented 3 years ago

Oh wow, okay.

So I tried reverting Marek's revert 2 weeks ago, but I must have accidentally reverted the wrong commit ID

commit bab37273543da7df4148773b96e677913dc52cd7 (HEAD -> xen-4.14)
Author: Dylanger Daly <dylanger@diagnostix.io>
Date:   Tue Dec 8 15:27:15 2020 +1000

    Revert "Fix S3 resume"

    This reverts commit c28754bdb458281a22e9a9779213c941531b6dff.

Reverting c28754bdb458281a22e9a9779213c941531b6dff that commit specifically results in a much smoother experience.

Then forcing the built package over the existing one

sudo rpm -ihv --force xen-hypervisor-4.14.0-8.fc32.x86_64.rpm

1080P YouTube playback still isn't 100% 'lurch' free, Tasket can you confirm this? There are still teeny-tiny little hiccups when playing 1080p videos.

@tasket, can I get what your CMDLINE is for Kernel and Xen? I'm still idling around 75C when there's nothing going on, have you changed any UEFI settings as well?

dylangerdaly commented 3 years ago

Another thing, what version of UEFI are you on?

I've updated to 1.27 (r1cuj58wd) https://download.lenovo.com/pccbbs/mobiles/r1cuj58wd.txt https://download.lenovo.com/pccbbs/mobiles/r1cuj58wd.iso

7bfdda966c172f1fdb0e27123b25e651bdb6f27529399ff96c858471612f2337  r1cuj58wd.iso

But it appears that's been removed from the page? Lenovo's Support page only shows 1.25 as the latest version.

I've possibly installed a version of UEFI that's overheating when on AC, so Lenovo pulled it?

Can someone confirm this?

tasket commented 3 years ago

Keeping in mind this is the T14 not X13, firmware is v1.05 from June 11.

FWIW, the T14 download page has 1.27 available: https://pcsupport.lenovo.com/us/en/products/laptops-and-netbooks/thinkpad-t-series-laptops/thinkpad-t14-type-20ud-20ue/downloads/ds544977-bios-update-utility-bootable-cd-for-windows-10-64-bit-thinkpad-t14-gen-1-types-20ud-20ue

tasket commented 3 years ago

Yes, there is still a little judder when HD video playback is fullscreen or nearly fullscreen. Using VLC, it becomes noticeable when the viewport is about 2/3 of full height.

The video RAM setting I mentioned earlier is one of the few I remember changing; others were security features.

From grub.cfg:

multiboot2      /xen-4.14.0.gz placeholder  console=none dom0_max_vcpus=1 dom0_vcpus_pin=1 dom0_mem=min:1024M dom0_mem=max:2048M ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096 ept=exec-sp ${xen_rm_opts}

echo    'Loading Linux 5.8.18-200.fc32.x86_64 ...'
module2 /vmlinuz-5.8.18-200.fc32.x86_64 placeholder root=/dev/mapper/qubes_dom0-root ro rd.luks.uuid=luks-ebc32f3e-6002-4c17-9759-db70e0f6c859 rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles rd.driver.pre=btrfs rhgb quiet amdgpu.noretry=0 amdgpu.gpu_recovery=1 iommu=pt amd_iommu=fullflush amdgpu.dc=1 rhgb rcu_nocbs=0 rd.qubes.hide_all_usb=0 

echo    'Loading initial ramdisk ...'
module2 --nounzip   /initramfs-5.8.18-200.fc32.x86_64.img
dylangerdaly commented 3 years ago

I managed to downgrade to 1.25, I tried booting Fedora 32 Workstation and it's at 45C on AC, so there's something wrong with Xen's Power Management I think

I was missing ept=exec-sp from Xen's CMDLINE, just re-added that and it's really smooth now

crat0z commented 3 years ago

tasket's command lines are basically the same as mine, and setting VMs to only using 1 or 2 cores seems okay for now. I don't know if lscpu or /proc/cpuinfo is accurate in this scenario, but under no circumstances does the CPU go above 2.1GHz, is it potentially not boosting?

Also, I am running default, sys-net/sys-firewall start-up on boot etc. and every single time I boot up my laptop in the morning, no other VMs will start after that. Only after I restart it, will other VMs e.g. "personal" start up. This is consistent, and I previously thought it was maybe SMT, but I've turned it off and still it behaves like that.

dylangerdaly commented 3 years ago

@crat0z can you confirm what Lenovo model you're on?

If you have an X13, can you confirm temperature increases (+10-15C) when the device is on AC?

I'm not sure why Xen still hasn't identified these issues, I think there are no server counterparts to Ryzen 4000? Also it's super hard to actually debug x86, compared to ARM/Qualcomm SoCs.

crat0z commented 3 years ago

I have a T14, and AC does not affect my temperatures, so I can't help with that unfortunately.

AFAIK, Renoir CPUs are Zen 2, which on desktop are Ryzen 3000 series CPUs, and in workstations/servers with codename "Rome", and have been available since mid 2019. It seems the server CPUs have been supported since Xen 4.13.

Of course, just because that's the case doesn't mean everything should be okay for us using laptops. There are probably quirks to the Renoir CPUs, but given new desktop CPUs just came out last month and server CPUs possibly coming out this month, perhaps "general support for AMD" will be much higher soon.

dylangerdaly commented 3 years ago

I'm not 💯, but I think the reason this isn't working is because Xen expects SMT is enabled, I read that AMD rely on their implementation of SMT for scheduling/timing.

I've also read that Xen have largely "fixed" the security issues relating to SMT/HT, is it possible to test SMT?

Switching smt=on sched-gran=core results in a black screen, maybe there's a commit I'm missing?

(I'm on 4.14.0-9)

This would explain why EPYC CPU using the same underlying cores that Ryzen 4000 use are working just fine with Xen (SMT Enabled)

I'm happy to have SMT enable knowing the scheduling is more secure. -------- Original Message -------- On 13 Dec 2020, 5:57 pm, crat0z wrote:

I have a T14, and AC does not affect my temperatures, so I can't help with that unfortunately.

AFAIK, Renoir CPUs are Zen 2, which on desktop are Ryzen 3000 series CPUs, and in workstations/servers with codename "Rome", and have been available since mid 2019. It seems the server CPUs have been supported since Xen 4.13.

Of course, just because that's the case doesn't mean everything should be okay for us using laptops. There are probably quirks to the Renoir CPUs, but given new desktop CPUs just came out last month and server CPUs possibly coming out this month, perhaps "general support for AMD" will be much higher soon.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

crat0z commented 3 years ago

As far as I know, in regards to SMT, the only known CPU vulnerability AMD CPUs have been vulnerable to has been Spectre which seems to have been taken care of with Zen 2 microarchitecture. At least for my use case, I'm fine with SMT enabled, but others might not be.

If I remember correctly, sched-gran=core command line option is still somewhat buggy and an experimental feature of Xen. I do not know what the current situation or future plans are for it, I haven't kept up for a while since I ditched my old laptop. It might have been canned, since even if core scheduling is used on a vulnerable CPU, it still allows information leaks within the compromised VM.

Unfortunately, as of now, enabling SMT does not remedy the required dom0_max_vcpus=1 dom0_vcpus_pin=1 command line options for a smooth experience. Whether SMT is enabled or not, without those command line options, Qubes is a laggy mess in the VMs and in dom0.

dylangerdaly commented 3 years ago

I'm noticing massive performance gains on Fedora 33 TemplateVM.

Browser is significantly smoother. @tasket update your TemplateVM and check it out

I've been trying to troubleshoot these multicore/credit2 issues, debugging Xen is really difficult without a serial output, basically no laptops have a serial output at all, for now max 2 vcpus per appVM will be around for a while as it's really hard to troubleshoot.

crat0z commented 3 years ago

I wonder if the desktop Renoir APUs e.g. 4750G have similar problems. Surely there is an AM4 motherboard that allows serial debugging.

dylangerdaly commented 3 years ago

I think they do, someone on VFIO Discord was experiencing the exact same issues on a Desktop Ryzen 3000 CPU.

MaximumViciousDeer commented 3 years ago

it won't really matter to gpu as such, but do you have Xorg-x11-drv-amdgpu installed? That aside, I'd suggest upping the max_vcpus to 3 or 4.

I'm testing this now on my Zephyrus G14. Sorry for noob question but how do I install Xorg-x11-drv-amdgpu?

0spinboson commented 3 years ago

run qubes-dom0-update xorg-x11-drv-amdgpu in dom0

MaximumViciousDeer commented 3 years ago

run qubes-dom0-update xorg-x11-drv-amdgpu in dom0

I had an issue with the HDMI port not working, this didn't fix it though. Might start a separate thread about that.

Qubes is running smoothly though and no performance issues yet. (on Ryzen 4900HS)

crat0z commented 3 years ago

@MaximumViciousDeer Can you try running a CPU benchmark with varying amount of cores in a VM? I used y-cruncher, and I'm on a 6 core 4650U, with SMT enabled in Xen. My results showed that setting 1 core, 2 core, 4 core, and 6 cores to the VM seemed to scale properly. beyond that, the 8 core and 12 core tests not only took longer to run y-cruncher, but there was very obvious input delay/lag/stuttering even just typing in the terminal. I can't post the results as I'm not on my laptop at the moment, but it was something like 6 cores did ~28 seconds and 8/12 cores did 33/35 seconds respectively. For reference, Ubuntu 20.10 live USB did 12 thread y-cruncher in about 25 seconds, with an 5.8 kernel. sched=credit does seem to have better performance by the way.

Geekbench may be a better utility to use, but it doesn't work with Fedora and I've been too lazy to try it in a debian VM/build Ubuntu template.

brendanhoar commented 3 years ago

Do the higher cpu count ryzen 4000 cpus have something like a NUMA arrangement?

crat0z commented 3 years ago

From what I remember with the Ubuntu live USB, I was checking lscpu to confirm there were 12 threads recognized, and it said there was only 1 NUMA node. More NUMA nodes might be used on their higher end desktop processors with more than 8 cores

brendanhoar commented 3 years ago

Ah, I missed that this was a 6-core cpu.

I thought the guidance for Qubes was not to assign more vcpus to a domU than real cores?

MaximumViciousDeer commented 3 years ago

I just tested y-cruncher and it scales properly up to 8 cores for the standard 500m computation. On 8 cores it took ~85 seconds, on 6 it took ~100 etc. There's only one NUMA node. My lscpu shows 8 cores and one thread per core.

crat0z commented 3 years ago

@brendanhoar I'm not sure, at least I never read that. I've ran Qubes on a few different machines, from old vulnerable Intel CPUs (i7-2620m, i7-5600u) and a more modern AMD r5 2600. with SMT enabled in Xen, they've all performed faster. Also, with SMT enabled utilities like xentop say Xen has more vCPUs to work with anyway, so it makes sense that SMT enabled is faster.

@MaximumViciousDeer lscpu is reporting 8 cores and 1 thread per core because Qubes ships with smt=off xen command line option by default. On boot up, you can press e and edit the xen command line to smt=on and lscpu will report 2 threads per core. Permanent changes are done in /etc/default/grub and you'll have to run grub-mkconfig in dom0 after.

Unfortunately though, I suspect Xen + SMT is broken with these CPUs. Given your 8 core's y-cruncher times got faster and faster up to 8 cores, mine does the same with 6 cores, I suspect yours will get worse if you enable SMT and try e.g. 16 vCPU y-cruncher.

Also, dom0_max_vcpus=1 dom0_vcpus_pin are the only good options definitely. I tried SMT disabled, tried max_vcpus=2,4,12, I tried not pinning, I tried between sched=credit and default, nothing is as good as the first one. It is immediately painfully obvious after entering LUKS password, as the time to start up is 3-10 times longer.

crat0z commented 3 years ago

Ah, nevermind. SMT does work and does have tangible differences. It seems y-cruncher might be a bad benchmark. 7z's benchmark showed SMT enabled 12 vCPU VM performs the fastest.


Power plugged in, SMT enabled in BIOS, no VMs open besides benchmark vm, sched=credit

Example output of 7z b:
                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:      23474   545   4191  22836  |     260193   598   3713  22189
23:      21214   573   3774  21615  |     226772   596   3291  19622
24:      19407   562   3715  20867  |     222532   598   3266  19532
25:      17078   551   3538  19499  |     215440   596   3218  19173
----------------------------------  | ------------------------------
Avr:             558   3804  21204  |              597   3372  20129
Tot:             577   3588  20667

RESULTS

smt=on

6 vCPU:
Avr:             558   3804  21204  |              597   3372  20129
Tot:             577   3588  20667

12 vCPU:
Avr:            1015   2429  24630  |             1182   2556  30190
Tot:            1098   2492  27410

smt=off

6 vCPU:
Avr:             514   4208  21591  |              585   3450  20194
Tot:             550   3829  20892

12 vCPU:
Avr:             489   4436  21592  |              565   3538  19958
Tot:             527   3987  20775

Yet, the "fastest' VM configuration still performs much worse from a user perspective. Funnily enough, while I was doing the smt=off tests, I noticed that setting the vCPU count to 5 stopped the input lag. Could it be Xen's scheduler causing this? The only other VM turned on is dom0, and in this case it has 1 vCPU and it's pinned.

MaximumViciousDeer commented 3 years ago

Qubes ships with smt=off

Isn't that a security feature to mitigate Spectre/Meltdown?

0spinboson commented 3 years ago

yeah, but it's not proven to be necessary for ryzen, esp. zen2 which has additional mitigations. Just intel that's fucked and getting worse every month.

0spinboson commented 3 years ago

IIRC while googling for this issue a few weeks ago I found someone (maybe the OP?) suggesting elsewhere that credit2 has trouble with zen2 consumer parts with disabled SMT?

marmarek commented 3 years ago

yeah, but it's not proven to be necessary for ryzen, esp. zen2 which has additional mitigations. Just intel that's fucked and getting worse every month.

Well, AMD has less of those bugs, but it doesn't mean it's completely unaffected. Some high level summary: https://en.wikipedia.org/wiki/Transient_execution_CPU_vulnerability - as you can see, zen2 is affected by some of them too. For specific platform I recommend checking with https://github.com/speed47/spectre-meltdown-checker

crat0z commented 3 years ago

I've been doing some IO testing with kdiskmark, and unsurprisingly the default Qubes configuration vastly outperforms the command line fix configuration. On my Sabrent Rocket Q SSD, sequential reads drop from 2000MB/s to about 700MB/s, random speeds are more or less the same. That's still faster than SATA 3, so it's certainly not unusable, but Qubes startup and VM startup especially is not as fast as it can be. On the topic of VM startups, the more vCPUs assigned to a VM, the longer startup takes... same thing for dom0. What a mystery.

Also noteworthy is that when the Xen command line options "fix" isn't applied, the stuttering/lag still occurs even when no VMs are running.

At this point I'm not really certain where else to look for diagnosing the issue(s), besides of course actually debugging Xen or Linux potentially? For anyone who is finding this now, here are some tips for a usable experience:

Post install of R4.1,

For Xen command line options,

As for setting up VMs to optimize performance and minimize stuttering, it all comes down to assessing each VM's vCPU and RAM requirements. Some general tips,

dylangerdaly commented 3 years ago

Thank you for testing and bench-marking further.

Yeah it seems the issue is mainly due to Xen's Scheduler, it's possible to debug the scheduler however some form of Serial is required, on Servers this is a non-issue, Mobile platforms however...

It's super hard to debug this unless you're an AMD Engineer with a Serial port attached to your laptop, I think it'll be something simple like a timing config that's a little different on Ryzen consumer parts

You can follow the mailing list here

https://wiki.xenproject.org/wiki/Xen_Serial_Console

marmarek commented 3 years ago

Serial is required

FWIW, as long as you have functioning dom0, all those debug handlers can be called via xl debug-key <letter> from dom0, and then output collected via xl dmesg or in /var/log/xen/console/hypervisor.log.

brendanhoar commented 3 years ago

One thing to keep in mind is that the debug conring for Xen is tiny. It can be increased with a boot option.

see: https://github.com/QubesOS/qubes-issues/issues/5674#issuecomment-648519412

n0madK commented 3 years ago

Is there any sort of standardized testing suite that could be used compare Qubes systems? I'm thinking single/multithreaded VM performance, various IO tests.

I have several Qubes systems now (now including a Ryzen 7 4750U) and its difficult to gauge whether one system or configuration is "faster" than another, other than anecdotally, especially across multiple metrics.

unman commented 3 years ago

On Mon, Feb 08, 2021 at 02:48:26PM -0800, n0madK wrote:

Is there any sort of standardized testing suite that could be used compare Qubes systems? I'm thinking single/multithreaded VM performance, various IO tests.

I have several Qubes systems now (now including a Ryzen 7 4750U) and its difficult to gauge whether one system or configuration is "faster" than another, other than anecdotally, especially across multiple metrics.

This has just come up on the user Forum - there isn't (as yet) a test suite, but discussions are under way.

dylangerdaly commented 3 years ago

Users that have the exact same CPU are reporting they don't need to pin/limit dom0 to 1 vCPU at all.

This includes domU's running > 2 vCPUs. No stuttering. No lag.

Device reported working is the HP EliteBook 845 G7.

This indicates there's something Lenovo specific that's requiring us to nerf performance.

Lenovo. Ugh, I'm guessing it's either UEFI or ACPI

dylangerdaly commented 3 years ago

I bring the gift of decent Xen performance on Lenovo X13/T14 devices!

So! It turns out Lenovo have absolutely trashed HPET/Clock, I believe this is what's causing the lags/jitters.

Appending clocksource=tsc tsc=unstable hpetbroadcast=0 to Xen's CMDLINE fixes the need for pinning 1 core to dom0, it also fixes the need for limiting 2 vCPUs per appVM.

It should force the use of the TSC for clock, set TSC 'unstable' and totally disable HPET.

I'm running 6 vCPUs within an appVM like butter! :butter: :butter: :butter:

Closing this issue! :tada:

isodude commented 3 years ago

This is solved in xen 4.15, no need for kernel parameters.

dylangerdaly commented 3 years ago

Another quick little update, Lenovo have graced us with an update to their UEFI firmware, the changelog is here

The point

Totally fixes the issues I was having in Qubes, after you update, it's possible to revert the Xen cmdline referenced here

crat0z commented 3 years ago

Very nice! Might be a little off-topic, but have you experienced issues where the laptop just completely shuts down? It seems to happen when I have a bunch of Firefox tabs open. It's quite random though, doesn't always happen. Usually the VM I'm using becomes unresponsive, presumably crashing Xen after a few seconds.

dylangerdaly commented 3 years ago

I have had this happen, though it's usually when it's plugged in on my bed, try use RyzenAdj with the --power-saving, that'll force the CPU to think it's unplugged and keep thermals under control/not totally crash.

This will be my last Lenovo device, as Lenovo has crossed into Huawei/untrusted territory.

I think HP will be my next device.

na-- commented 3 years ago

Sorry for the thread necromancy and redirection, but I'm experiencing some high CPU thermals on my Ryzen 7 4800H CPU when it should be idle. I've tried most of the suggested solutions in this thread with no success, can anyone point me to a way I can diagnose or fix the issue? :pray: I've described the details in https://github.com/QubesOS/qubes-issues/issues/6647

DemiMarie commented 2 years ago

Reopening as Qubes OS should work out of the box.

Geblaat commented 1 year ago

I had similar issues with a new Thinkpad L14 Gen3 AMD with a Ryzen 5875U. clocksource=tsc tsc=unstable hpetbroadcast=0 has fixed it for now.

Another quick little update, Lenovo have graced us with an update to their UEFI firmware, the changelog is here

The point

* Fixed an issue that Fixed TSC synchronization  failed under linux.

Totally fixes the issues I was having in Qubes, after you update, it's possible to revert the Xen cmdline referenced here

Unfortunately it seems newer models are still affected by this UEFI issue. I'm already on the latest version from my Gen3 L14, but there is no mention of a TSC fix in the UEFI firmware changelog and I still need to use clocksource=tsc tsc=unstable hpetbroadcast=0.

Geblaat commented 1 month ago

I had similar issues with a new Thinkpad L14 Gen3 AMD with a Ryzen 5875U. clocksource=tsc tsc=unstable hpetbroadcast=0 has fixed it for now.

Another quick little update, Lenovo have graced us with an update to their UEFI firmware, the changelog is here The point

* Fixed an issue that Fixed TSC synchronization  failed under linux.

Totally fixes the issues I was having in Qubes, after you update, it's possible to revert the Xen cmdline referenced here

Unfortunately it seems newer models are still affected by this UEFI issue. I'm already on the latest version from my Gen3 L14, but there is no mention of a TSC fix in the UEFI firmware changelog and I still need to use clocksource=tsc tsc=unstable hpetbroadcast=0.

This was fixed in in the 1.32 BIOS/UEFI update for L14 Gen3 AMD. Though it seems the newer Xen version in 4.2 already fixed this, as I already had no need for the kernel parameters with 4.2 with the older BIOS/EUFI. With 4.1 now EOL, this issue can be closed I think.