QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
535 stars 46 forks source link

Ryzen 4000 CPU Performance Issues (Lockups) #6055

Open dylangerdaly opened 3 years ago

dylangerdaly commented 3 years ago

Qubes OS version 4.1

Affected component(s) or functionality Entire OS/Experience

Brief summary There appears to be something wrong with the CPU, every 3-5 seconds everything will lockup, here's a GIF for visuals.

I've confirmed this is specific to AMD 4000 CPUs because 4.1 running on a i7-1065G7 works fine (Still at a much slower rate than 4.0.3 but that's beside the point)

To Reproduce

Steps to reproduce the behavior:

  1. Install Qubes 4.1 onto a Lenovo X13 or any AMD Ryzen 4000 Laptop
  2. Observe weird lockups and full hangs

Expected behavior Smooth as butter 8 Core experience

Actual behavior Terrible lockups every 3-5 seconds with full hangs peppered in randomly

Screenshots See GIF in Brief

Additional context NIL

Solutions you've tried Not sure how/where to troubleshoot this, I assume it has something to do with Xen.

0spinboson commented 3 years ago

kernel-latest installed?

dylangerdaly commented 3 years ago

Indeed, 5.8.8-200, 4.14-3 Xen

0spinboson commented 3 years ago

ah, yes. try going back to Xen 4.13.1-4 in dom0. 4.14 was terrible for me too, wrt graphics performance that is.

dylangerdaly commented 3 years ago

Aw, I can't, Ryzen 4000 series CPUs won't boot unless I'm using 4.14, I think I'll just need to wait for 4.15?

Can I rebase 4.14 to Xen's Master? I don't think Marek has made many changes to Xen

0spinboson commented 3 years ago

I dunno, it probably needs troubleshooting to get it working right on 4.14, I just didn't really know where to look for error reports. (To be clear, I didn't have CPU lockups, just display issues once I opened more than a few VM windows concurrently.)

dylangerdaly commented 3 years ago

Oooooooo thank for @0spinboson for point me in the right direction

I think this has been fixed via adding processor.max_cstate=5 to CMDLINE, I'm no longer getting softlocks and it's :butter: smooth.

I'll do some more testing before I close this issue :tada:

dylangerdaly commented 3 years ago

Hm, not quite, it worked for one boot, but not subsequent boots.

It does seem to be related to a public AMD issue because the bug is everywhere.

Fedora Workstation 32 Live USB works just fine, I'll have a look at it's default CMDLINE.

0spinboson commented 3 years ago

you mean you added it to grub bootline? Did you run grub2-mkconfig after?

dylangerdaly commented 3 years ago

Yeah I've been testing with just hitting 'e' and inserting stuff into CMDLINE, I added processor.max_cstate=5 to my actual grub2 config and did mkconfig etc same result.

It looks like it's could be related to IRQs, if I just leave it idling it'll be okay, if I start doing thing it'll soft lock.

It looks like AMD created a sixth C-State, setting it to 5 simply removes that C-State.

0spinboson commented 3 years ago

just to be clear: you added it to /etc/default/grub, then ran grub2-mkconfig -o /boot/grub2/grub.cfg (assuming you use legacy boot, idk the efi equivalent), but it doesn't persist?

dylangerdaly commented 3 years ago

Yes, sometimes my HVMs don't come up, sometimes they do, I think this CPU has had basically no testing with Xen/Virtualization

Yeah Fedora 32 Live ISO runs like butter every time with a 5.6.6 kernel, really weird

stiell commented 3 years ago

This looks very similar to what I exprienced on my Ryzen 4000 series laptop. While trying to fix wake-from-suspend (still not resolved), I discovered that the following workaround solves the lockup/stutter issue:

dylangerdaly commented 3 years ago

You absolute legend!

This has totally worked 🎉

I'll do some more testing with this today, but really smooth now.

Other appVMs are working smoothly!

Thank you 🙌

Sent from ProtonMail mobile

-------- Original Message -------- On Sep 16, 2020, 5:59 AM, Stian Ellingsen wrote:

This looks very similar to what I exprienced on my Ryzen 4000 series laptop. While trying to fix wake-from-suspend (still not resolved), I discovered that the following workaround solves the lockup/stutter issue:

  • Add options dom0_max_vcpus=1 dom0_vcpus_pin to GRUB_CMDLINE_XEN_DEFAULT in file etc/default/grub.
  • Run sudo grub2-mkconfig -o /boot/efi/EFI/qubes/grub.cfg (assuming this is a Qubes 4.1 UEFI install).
  • Reboot.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

marmarek commented 3 years ago

Can it be related to (lack of) NUMA support in Qubes?

dylangerdaly commented 3 years ago

Yeah wow, Electron Applications (Element and Chromium for example) are running buttery smooth, even better than my Intel 10th Gen device running 4.1

Dare I say it's running better than 4.0.3

Can it be related to (lack of) NUMA support in Qubes?

I think so, I'm trying to understanding what dom0_max_vcpus=1 is actually doing, am I limiting my cores to 1? Because it doesn't feel like I'm only using a single core.

I've never seen Qubes run this smoothly before :butter: almost feels like I'm cheating

0spinboson commented 3 years ago

am I limiting my cores to 1

correct. you can probably also set it to 2 though. :)

brendanhoar commented 3 years ago

Can I request a clarification for the thread?

This setting change (“dom0_max_vcpus=1“) only sets the virtual cpu limit for the dom0 VM and does not impact virtual cpus available to domU VMs, correct?

If so, then I’d posit the likely impact of the workaround would primarily be on dom0-attached storage latency/throughput for most users (vs. faster with multiple vCPUs in dom0 with a real fix for the sleep states issue). Perhaps also high-throughput windowing (video playback from a domU VM).

Though with AMDs current CPU lineup, the hit might not even be really noticeable outside of benchmarks. :)

dylangerdaly commented 3 years ago

Oof yeah good point, I messed up default storage pools, so I'm migrating from varlibqubes to lvm, just via dd.

It is slooooooow, I'm not 100% sure, but I think even simply browsing via FF is doing disk IO, I get these little tiny hiccups sometimes, this could however be related to the fact Xorg is choking.

Also I wonder if maybe the GPU is suffering as a result of dom0, and by extension the vega driver being limited to 1 core?

Marek: Can it be related to (lack of) NUMA support in Qubes?

Qubes or Xen? I was sort of hoping 4.15 would have this fixed.

dylangerdaly commented 3 years ago

Yeah playing videos isn't working at all, 1 frame per second?

I think I was saying it was smooth before because there wasn't much disk IO going on, with disk IO it appears to be choppy.

0spinboson commented 3 years ago

it won't really matter to gpu as such, but do you have Xorg-x11-drv-amdgpu installed? That aside, I'd suggest upping the max_vcpus to 3 or 4.

dylangerdaly commented 3 years ago

Yeah just installed it, isn't really making a difference.

It's weird, if I assign 2 or 4 cores to dom0, I get much, much worse performance, basically unusable performance. I'm going to put it down to Xen support with AMD based processors and hope 4.15 I can drop the core limit.

tasket commented 3 years ago

Although it can be manually entered in grub's runtime menu, the grub2-mkconfig command does not allow dom0_vcpus_pin and won't update the /boot config. It will say 'command not found'.

I tried with dom0_max_cpus=2 and it ran like hot garbage btw. Setting to '1' was much better.

dylangerdaly commented 3 years ago

Hm, that should be working, I didn't get any errors when setting it?

Even with it set, are you noticing weird little hiccups? I'm hoping Xen is working on better handling with AMD CPUs.

Not sure if I'm getting performance issues because dom0 only has 1 core, or if it's Xen.

If you give an appVM only 2 vcores it seems to be a little more stable

-------- Original Message -------- On Sep 18, 2020, 8:53 PM, tasket wrote:

Although it can be manually entered in grub's runtime menu, the grub2-mkconfig command does not allow dom0_vcpus_pin and won't update the /boot config. It will say 'command not found'.

I tried with dom0_max_cpus=2 and it ran like hot garbage btw. Setting to '1' was much better.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

tasket commented 3 years ago

I resolved it by using the assignment syntax dom0_vcpus_pin=1 instead. I think its a parsing bug.

Edit: I haven't yet pushed any 4.1 VMs hard on my system, so I don't have a clear idea of how they perform at this point.

tasket commented 3 years ago

The 5.8.12-200 kernel update results in a system lockup on boot (after KDE plasma logo appears), but the 5.8.11-200 kernel works. Here is log output from the failed boot:

Oct 02 12:42:28 dom0 kernel: BUG: kernel NULL pointer dereference, address: 00000000000003a8
Oct 02 12:42:28 dom0 kernel: #PF: supervisor read access in kernel mode
Oct 02 12:42:28 dom0 kernel: #PF: error_code(0x0000) - not-present page
Oct 02 12:42:28 dom0 kernel: PGD 0 P4D 0 
Oct 02 12:42:28 dom0 kernel: Oops: 0000 [#1] SMP NOPTI
Oct 02 12:42:28 dom0 kernel: CPU: 0 PID: 3468 Comm: Xorg Tainted: G        W         5.8.12-200.fc32.x86_64 #1
Oct 02 12:42:28 dom0 kernel: Hardware name: LENOVO 20UDCTO1WW/20UDCTO1WW, BIOS R1BET36W(1.05 ) 06/11/2020
Oct 02 12:42:28 dom0 kernel: RIP: e030:mmu_interval_notifier_remove+0x16/0x140
Oct 02 12:42:28 dom0 kernel: Code: c5 74 e1 48 89 e6 48 89 ef e8 c6 bb e4 ff eb a6 0f 1f 40 00 0f 1f 44 00 00 41 55 41 54 55 48 89 fd 53 48 83 ec 28 4c 8b 67 38 <49> 8b 9c 24 a8 03 00 00 e8 ad 0b 89 00 4c 8d 6b 0c 4c 89 ef e8 c1
Oct 02 12:42:28 dom0 kernel: RSP: e02b:ffffc900010a7d30 EFLAGS: 00010286
Oct 02 12:42:28 dom0 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
Oct 02 12:42:28 dom0 kernel: RDX: 0000000000000001 RSI: ffffffff81b716e0 RDI: ffff88803b8cf000
Oct 02 12:42:28 dom0 kernel: RBP: ffff88803b8cf000 R08: 7fffffffffffffff R09: 0000000000000000
Oct 02 12:42:28 dom0 kernel: R10: ffff88802a1bf2a0 R11: ffff8880652c43b0 R12: 0000000000000000
Oct 02 12:42:28 dom0 kernel: R13: 00000000fffffffc R14: ffff8880467011c0 R15: ffff8880467011d0
Oct 02 12:42:28 dom0 kernel: FS:  00007f8118c92a40(0000) GS:ffff88807d000000(0000) knlGS:0000000000000000
Oct 02 12:42:28 dom0 kernel: CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 02 12:42:28 dom0 kernel: CR2: 00000000000003a8 CR3: 00000000054ea000 CR4: 0000000000040660
Oct 02 12:42:28 dom0 kernel: Call Trace:
Oct 02 12:42:28 dom0 kernel:  gntdev_mmap+0x275/0x318 [xen_gntdev]
Oct 02 12:42:28 dom0 kernel:  mmap_region+0x43e/0x6e0
Oct 02 12:42:28 dom0 kernel:  do_mmap+0x42f/0x540
Oct 02 12:42:28 dom0 kernel:  vm_mmap_pgoff+0xb0/0xf0
Oct 02 12:42:28 dom0 kernel:  ksys_mmap_pgoff+0x18a/0x250
Oct 02 12:42:28 dom0 kernel:  do_syscall_64+0x4d/0x90
Oct 02 12:42:28 dom0 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 02 12:42:28 dom0 kernel: RIP: 0033:0x7f811917b526
Oct 02 12:42:28 dom0 kernel: Code: 01 00 66 90 f3 0f 1e fa 41 f7 c1 ff 0f 00 00 75 2b 55 48 89 fd 53 89 cb 48 85 ff 74 37 41 89 da 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 62 5b 5d c3 0f 1f 80 00 00 00 00 48 8b 05 39
Oct 02 12:42:28 dom0 kernel: RSP: 002b:00007fff2cc0c8f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
Oct 02 12:42:28 dom0 kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f811917b526
Oct 02 12:42:28 dom0 kernel: RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
Oct 02 12:42:28 dom0 kernel: RBP: 0000000000000000 R08: 0000000000000009 R09: 0000000000000000
Oct 02 12:42:28 dom0 kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 00007fff2cc0c910
Oct 02 12:42:28 dom0 kernel: R13: 0000000000000001 R14: 0000000000000009 R15: 0000000000000001
Oct 02 12:42:28 dom0 kernel: Modules linked in: fuse snd_seq_dummy snd_hrtimer loop nf_tables nfnetlink vfat fat snd_acp3x_rn snd_soc_dmic snd_acp3x_pdm_dma snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine tps6598x roles iwlwifi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi rapl snd_hda_intel snd_intel_dspcfg snd_hda_codec cfg80211 joydev wmi_bmof snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm k10temp snd_rn_pci_acp3x sp5100_tco thinkpad_acpi i2c_piix4 snd_pci_acp3x ipmi_devintf r8169 snd_timer ucsi_acpi ipmi_msghandler typec_ucsi ledtrig_audio snd typec soundcore rfkill i2c_scmi i2c_multi_instantiate xenfs ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt mmc_block amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper rtsx_pci_sdmmc mmc_core cec drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme serio_raw rtsx_pci ccp nvme_core wmi video pinctrl_amd hid_logitech_dj xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn
Oct 02 12:42:28 dom0 kernel:  uinput
Oct 02 12:42:28 dom0 kernel: CR2: 00000000000003a8
Oct 02 12:42:28 dom0 kernel: ---[ end trace 099ca5886879f3a7 ]---
Oct 02 12:42:28 dom0 kernel: RIP: e030:mmu_interval_notifier_remove+0x16/0x140
Oct 02 12:42:28 dom0 kernel: Code: c5 74 e1 48 89 e6 48 89 ef e8 c6 bb e4 ff eb a6 0f 1f 40 00 0f 1f 44 00 00 41 55 41 54 55 48 89 fd 53 48 83 ec 28 4c 8b 67 38 <49> 8b 9c 24 a8 03 00 00 e8 ad 0b 89 00 4c 8d 6b 0c 4c 89 ef e8 c1
Oct 02 12:42:28 dom0 kernel: RSP: e02b:ffffc900010a7d30 EFLAGS: 00010286
Oct 02 12:42:28 dom0 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
Oct 02 12:42:28 dom0 kernel: RDX: 0000000000000001 RSI: ffffffff81b716e0 RDI: ffff88803b8cf000
Oct 02 12:42:28 dom0 kernel: RBP: ffff88803b8cf000 R08: 7fffffffffffffff R09: 0000000000000000
Oct 02 12:42:28 dom0 kernel: R10: ffff88802a1bf2a0 R11: ffff8880652c43b0 R12: 0000000000000000
Oct 02 12:42:28 dom0 kernel: R13: 00000000fffffffc R14: ffff8880467011c0 R15: ffff8880467011d0
Oct 02 12:42:28 dom0 kernel: FS:  00007f8118c92a40(0000) GS:ffff88807d000000(0000) knlGS:0000000000000000
Oct 02 12:42:28 dom0 kernel: CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 02 12:42:28 dom0 kernel: CR2: 00000000000003a8 CR3: 00000000054ea000 CR4: 0000000000040660
Yethal commented 3 years ago

Try adding following kernel flags (in whichever combination makes it work): idle=nomwait amdgpu.noretry=0 amdgpu.gpu_recovery=1 iommu=pt amd_iommu=fullflush rhgb rcu_nocbs=0-15 amdgpu.dc=1 I have (almost) the same laptop and went through a lot of hoops to make it work under Linux.

dylangerdaly commented 3 years ago

@marmarek can I just add that to GRUB?

Can it be related to (lack of) NUMA support in Qubes?

Where can I look to start fixing this / supporting NUMA nodes for Qubes?

I'm still experiencing little micro lockups when trying to watch a YouTube video for example, this may have something to do with the Ryzen CPU sleeping when it shouldn't be, any UI animation/scrolling suffers from stuttering my CMDLINE is

placeholder root=/dev/mapper/qubes_dom0-root ro rd.luks.uuid=<UUID> rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles processor.max_cstate=5 rd.driver.pre=btrfs idle=nomwait amdgpu.noretry=0 amdgpu.gpu_recovery=1 iommu=pt amd_iommu=fullflush rcu_nocbs=0-15 amdgpu.dc=1 rhgb quiet

I wonder if Xen isn't exposing MSRs that dom0 needs to set power management?

Can anyone else running this CPU confirm this little performance quirk?

tasket commented 3 years ago

@marmarek The 4.14.0-5 update was a miss. After updating, there was no way I could avoid rhythmic lurching and lagging in all the domUs. The effect only appears after a few minutes and was not quite as bad as having >1 vcpus assigned to dom0. But its still enough for me to mark this update as "bad".

After I downgraded back to the -4 version the system runs smoothly again. BTW, I'm using the 5.8.13 kernel now, and have added the ept=exec-sp boot option to xen.

tasket commented 3 years ago

I also should note I tried @Yethal suggested params in various combinations but they had no affect on the lurching with the xen -5 update.

@dylangerdaly Since the major difference between -4 and -5 versions is the S3 handling patch, then power management does seem to be involved (even though I haven't been using suspend).

marmarek commented 3 years ago

@tasket one of the changes between -4 and -5 was reverting this commit, as it breaks S3. While it is related to AMD processors, it isn't exactly obvious how it would make a difference in this case (it is focused on systems with a lot of cores, like 128 or 96). Maybe yet another side effect of this change...

dylangerdaly commented 3 years ago

Yeah I can confirm -5 is terrible, makes it unusable, I'll revert to -4 for now and try identify what's going on.

@tasket how are you downgrading packages?

Xen sucks. Even on -4 it's usable, but still there's always been micro-lockups, I suspect because support for Ryzen 4000 series CPUs.

I'll test tomorrow, but if it is related to this commit, can we not?

As Intel goes down for it's long nap AMD should start receiving first class support.

dylangerdaly commented 3 years ago

Huh, so I assigned 1 core an appVM, super duper smooth, UI animations etc are all really, really smooth.

So it's :100: that Xen can't seem to handle more than 1 Core on Ryzen 4000, I assume this is a CPU scheduling issue?

2 cores seems to be the sweet spot between being smooth and being usable

tasket commented 3 years ago

@dylangerdaly Sorry, I didn't notice your question. Going through my history, I downgraded with this:

$ sudo qubes-dom0-update --enablerepo=qubes*testing --action=downgrade xen-2001:4.14.0-4

$ sudo dnf downgrade xen-libs-4.14.0-4.fc32.x86_64.rpm python3-xen-4.14.0-4.fc32.x86_64.rpm xen-4.14.0-4.fc32.x86_64.rpm xen-hypervisor-4.14.0-4.fc32.x86_64.rpm xen-libs-4.14.0-4.fc32.x86_64.rpm xen-runtime-4.14.0-4.fc32.x86_64.rpm

IIRC in between those two commands I had to copy the xen packages from the updatevm into dom0.


@marmarek Does the 4.14.0-6 update contain the scheduling bug? If so, it will be unusable for us.

marmarek commented 3 years ago

Yes, -6 is -5 + XSA applied (and one other fix, but unrelated to this issue).

dylangerdaly commented 3 years ago

Ugh, we'll now need to maintain a fork without the regression.

More investigation is required, the patch that you're reverting is important to people with AMD CPUs.

tasket commented 3 years ago

@dylangerdaly For now I have installed the dnf versionlock extension and added a couple packages that keep the xen suite at -4:

$ sudo dnf versionlock add xen xen-libs xen-hypervisor
marmarek commented 3 years ago

I've uploaded -6.1 with this revert reverted to the unstable repository, you can install it with:


sudo qubes-dom0-update --enablerepo=qubes-dom0-unstable --action=update xen-hypervisor
dylangerdaly commented 3 years ago

Thank you very much Marek!

dylangerdaly commented 3 years ago

Hey @marmarek, any chance we can get an updated xen-hypervisor without the regression again?

I've updated Xen and am noticing the soft-locks.

marmarek commented 3 years ago

This approach doesn't scale... Even if I upload "fixed" version again, the problem will return with every other update. I'll try to debug the base issue, but it's some nasty race condition in scheduler that is hard to track :/

dylangerdaly commented 3 years ago

I'll have a look at it today, try to help track it down

Sent from ProtonMail mobile

-------- Original Message -------- On 20 Nov 2020, 2:05 am, Marek Marczykowski-Górecki wrote:

This approach doesn't scale... Even if I upload "fixed" version again, the problem will return with every other update. I'll try to debug the base issue, but it's some nasty race condition in scheduler that is hard to track :/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

crat0z commented 3 years ago

There seems to be quite a few problems with Renoir CPUs and Xen at the moment. Besides this and the temporary fix working, my 4650u is stuck at 2.1GHz, and xenpm get-cpufreq-para says failed to get cpufreq parameter. Also, enabling SMT can sometimes makes Xen unable to startup/stop VMs, e.g. sys-firewall will start but other VMs won't after, sometimes sys-firewall won't start, sometimes its fine etc. Might be other performance related issues, but haven't noticed anything else.

Unfortunately, I do not believe serial connection for early debugging is possible, and at least on my model anyway, debugging over USB seems to not be possible, but I could be wrong. Is there any other info that we can provide which xen might be able to tell us? I haven't checked what xl dmesg and other commands output when booted without the command line fixes.

DemiMarie commented 3 years ago

SMT is inherently insecure in the absence of a core scheduler, which Xen doesn’t have. So I would not worry about it.

tasket commented 3 years ago

Should I ask....

Is there evidence that AMD devs are helping Xen Project support their new products?

0spinboson commented 3 years ago

Should I ask....

Is there evidence that AMD devs are helping Xen Project support their new products?

yes, quite a bit. Though more so for the server products, for obvious reasons.

dylangerdaly commented 3 years ago

Another interesting thing I've noticed is when connected up to an external screen (4K), I'm hovering around 85C CPU temp, without the screen I'm around 45C.

I think the iGPU isn't configured correctly or I'm missing a x11 lib? Anyone else noticing this?

I've updated Xen and am noticing the soft-locks.

I think I misspoke here and was just stuttering because of my 4K display.

yes, quite a bit. Though more so for the server products, for obvious reasons.

Yeah I can imagine Mobile CPU aren't really looked at.

dylangerdaly commented 3 years ago

@tasket have you noticed when charging, it's idling ~75C compared to a cool ~45C on battery?

Not sure if it's a power management bug or if it's something physical.

tasket commented 3 years ago

@dylangerdaly My system hovers around 42C both on an off AC, with one or two FHD displays. Keep an eye on your CPU usage with xentop; CPU stress is the only thing that seems to raise the temp.

BTW I'm using 5.8.18 in dom0 as the 5.9 kernels won't boot. My UEFI graphics RAM at the lowest setting (not auto-sized). I did install xorg-x11-drv-amdgpu along with KDE and sddm. There was some method I used to check that x11 was using the AMD driver but I can't recall atm.

On edit: You might try changing graphics memory to the maximum in your UEFI settings, since 4K seems like it would demand more.

dylangerdaly commented 3 years ago

I'm glad you confirmed the Kernel issues, I'm having the exact same issue, newer kernels are boot looping.

Give play around with these settings, cheers @tasket

tasket commented 3 years ago

I just did a quick test on a 4K display and there was no temperature increase.

On other variable I can think of right now is the built-in Wifi, which I'm not using at all (not assigned to any running netvm). This would make the rest of the system go haywire after a while, so I'm relying on a NIC in sys-usb.