Open ma-neumann opened 1 week ago
@ma-neumann thank you for the report.
pcengines_apu2_v0.9.1-rc1
Where is it coming from?
@pietrushnic I have compiled the tag pcengines_apu2_v0.9.1-rc1
in your coreboot repo: https://github.com/Dasharo/coreboot/releases/tag/pcengines_apu2_v0.9.1-rc1
git clone https://github.com/Dasharo/coreboot.git && cd coreboot
git checkout pcengines_apu2_v0.9.1-rc1
git submodule update --init --checkout
./build.sh apu2
Thank you so much for testing. Can you confirm that on v0.9.0? We would at least know if this is a regression or a known bug. Fixing IOMMU is not easy because we don't have a comprehensive test suite covering various hardware, but I hope we can satisfy your case without breaking others.
This is a known problem already from the traditional PC Engines firmware. https://github.com/pcengines/apu2-documentation/issues/240
@miczyg1 It looks related to me too, but the symptoms differ, don't you think?
Why? It was also caused by ath10k_pci according to comments.
Somehow I speculate the INVALID_DEVICE_REQUEST
make a difference. But sure, I don't know.
And I suspect you are also implying that this is probably no regression. I will test v0.9.0
shortly.
Unfortunately, it does not seem to be a regression (at least not from v0.9.0
)
[ 57.392307] ath10k_pci 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xcfd51ad0 flags=0x0070]
Yet, I just noticed that I had missed so far that the Linux kernel raises an exception when it initializes the IOMMU. The following happens right at the beginning when the kernel starts (on v0.9.0
as well as on v0.9.1-rc1
):
[ 2.045004] AMD-Vi: Extended features (0x800290ad2, 0x0): PPR GT IA GA PC GA_vAPIC
[ 2.052762] AMD-Vi: Interrupt remapping enabled
[ 2.800832] Freeing initrd memory: 65576K
[ 23.731828] ------------[ cut here ]------------
[ 23.736476] WARNING: CPU: 2 PID: 1 at drivers/iommu/amd/init.c:980 enable_iommus_vapic+0
[ 23.745488] Modules linked in:
[ 23.748595] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.8.0-48-generic #48-Ubuntu
[ 23.756119] Hardware name: PC Engines apu2/apu2, BIOS Dasharo (coreboot+UEFI) v0.9.0 034
[ 23.764850] RIP: 0010:enable_iommus_vapic+0x343/0x3a0
[ 23.769942] Code: e9 9e fd ff ff 49 8b 47 38 48 83 c0 18 48 8b 00 48 b9 00 00 00 00 00 c
[ 23.788737] RSP: 0018:ffffb6a28002fd20 EFLAGS: 00010246
[ 23.794000] RAX: 0000000000000000 RBX: 00000000001e8480 RCX: 0000000000000000
[ 23.801170] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 23.808338] RBP: ffffb6a28002fd58 R08: 0000000000000000 R09: 0000000000000000
[ 23.815504] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000080000000
[ 23.822672] R13: 000ffffffffffff8 R14: 0800000000000000 R15: ffff9818002e9000
[ 23.829842] FS: 0000000000000000(0000) GS:ffff98182ad00000(0000) knlGS:0000000000000000
[ 23.837982] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 23.843759] CR2: 0000000000000000 CR3: 0000000076c3c000 CR4: 00000000000406f0
[ 23.850921] Call Trace:
[ 23.853398] <TASK>
[ 23.855532] ? show_regs+0x6d/0x80
[ 23.858979] ? __warn+0x89/0x160
[ 23.862255] ? enable_iommus_vapic+0x343/0x3a0
[ 23.866739] ? report_bug+0x17e/0x1b0
[ 23.870446] ? handle_bug+0x51/0xa0
[ 23.873974] ? exc_invalid_op+0x18/0x80
[ 23.877850] ? asm_exc_invalid_op+0x1b/0x20
[ 23.882073] ? enable_iommus_vapic+0x343/0x3a0
[ 23.886553] amd_iommu_enable_interrupts+0x12a/0x350
[ 23.891553] ? amd_iommu_init_pci+0x261/0x310
[ 23.895949] state_next+0x42c/0x4d0
[ 23.899475] amd_iommu_init+0x21/0x80
[ 23.903174] ? __pfx_pci_iommu_init+0x10/0x10
[ 23.907597] pci_iommu_init+0x13/0x70
[ 23.911294] ? __pfx_pci_iommu_init+0x10/0x10
[ 23.915691] do_one_initcall+0x5e/0x340
[ 23.919596] do_initcalls+0x107/0x230
[ 23.923299] ? __pfx_kernel_init+0x10/0x10
[ 23.927435] kernel_init_freeable+0x134/0x210
[ 23.931829] kernel_init+0x1b/0x200
[ 23.935355] ret_from_fork+0x47/0x70
[ 23.938970] ? __pfx_kernel_init+0x10/0x10
[ 23.943106] ret_from_fork_asm+0x1b/0x30
[ 23.947071] </TASK>
[ 23.949290] ---[ end trace 0000000000000000 ]---
Seems like the kernel is stuck for about 20 seconds at first, and then raises an exception somewhere in the end of function enable_iommus_vapic
(for the code of the function see also [1]).
I do not understand this IOMMU code in the kernel, but from the code it looks like its somehow about remapping interrupts (given the CONFIG_IRQ_REMAP
macro in [2]).
By trial and error I have switched the AMD's IOMMU remapping mode from vapic
to legacy
(whatever it means) using the kernel option amd_iommu_intr=legacy
(see also [3]).
Now, the kernel does not get stuck anymore and it seems to successfully initialize the IOMMU. Unfortunately, the original symptom -- the IO_PAGE_FAULT
-- is still there.
By another round of trial and error I have put the IOMMU into pass-through mode (whatever it means) using the kernel option iommu=pt
(see also [3]). This seems to have fixed the IO_PAGE_FAULT
-- seems like they are gone.
[1] https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/tree/drivers/iommu/amd/init.c?h=master-next--2024.09.30-1--auto#n2859 [2] https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/tree/drivers/iommu/amd/init.c?h=master-next--2024.09.30-1--auto#n2861 [3] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
It all sounds suspicious. I'm not an IOMMU expert, but pass-through means the device can access memory directly without IOMMU translating virtual addresses to physical addresses. It stops complaining because the previous IO_PAGE_FAULT
could be just a symptom of the device trying to access the memory region protected by IOMMU. Or maybe we need some other stuff configured for that device to function correctly.
I wonder what @krystian-hebel and @andyhhp think about that.
Yeah, iommu=pt
is just turning the IOMMU off, so hiding the problem that way.
The google groups link isn't quite correct. Flags of 0x0070 do translate to PE, RW, PR, but that means the device is trying to write to a region marked read-only in the IOMMU.
All the addresses seem to be quite close together. Does 0xced536d0 fall in any region described in /proc/iomem ?
enable_iommus_vapic()
is a virtualisation feature, and unless you're planning to run KVM VMs, you don't need it. The backtrace you get is hitting the WARN_ON() there https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/tree/drivers/iommu/amd/init.c?h=master-next--2024.09.30-1--auto#n2888 which is indicating that VAPIC isn't playing ball.
I'm not aware of any extra configuration the firmware would need to do to set up VAPIC, but I wouldn't rule it out either. Either way, I think that's a red herring and unrelated to IO_PAGE_FAULTs.
@andyhhp Good morning, thank you very much. The IOMMU is back on.
[ 40.182570] ath10k_pci 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xcff49550 flags=0x0070]
[ 54.820661] ath10k_pci 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xcece9e50 flags=0x0070]
[ 246.667117] ath10k_pci 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xcedeb950 flags=0x0070]
These three seem to be hits of "System RAM" regions and a "RAM buffer" region.
$ sudo cat /proc/iomem
00000000-00000fff : Reserved
00001000-0009ffff : System RAM
000a0000-000fffff : Reserved
000a0000-000dffff : PCI Bus 0000:00
000f0000-000fffff : System ROM
00100000-ce2e3fff : System RAM
c3800000-c4dfffff : Kernel code
c4e00000-c5bfafff : Kernel rodata
c5c00000-c605673f : Kernel data
c6561000-c69fffff : Kernel bss
ce2e4000-ce2e5fff : ACPI Tables
ce2e6000-ce2e6fff : System RAM
ce2e7000-ce2e9fff : Reserved
ce2ea000-ce2eafff : System RAM
ce2eb000-ce3a5fff : Reserved
ce3a6000-ce98dfff : System RAM
ce98e000-ce9d6fff : Reserved
ce9d7000-cebd8fff : System RAM
cebd9000-cebddfff : ACPI Tables
cebde000-cec0efff : System RAM
cec0f000-cec10fff : ACPI Tables
cec11000-cec13fff : System RAM
cec14000-cec15fff : ACPI Tables
cec16000-cecacfff : System RAM
cecad000-cecadfff : ACPI Tables
cecae000-ced6dfff : System RAM << HIT 2
ced6e000-ced86fff : Reserved
ced87000-cf697fff : System RAM << HIT 3
cf698000-cf698fff : ACPI Tables
cf699000-cf6adfff : System RAM
cf6ae000-cf7edfff : Reserved
cf7ee000-cf7f4fff : System RAM
cf7f5000-cf7f5fff : ACPI Non-volatile Storage
cf7f6000-cf7fdfff : ACPI Tables
cf7fe000-cfc4efff : System RAM
cfc4f000-cfefffff : Reserved
cfc7c000-cfc83fff : BOOT0000:00
cff00000-cfffffff : RAM buffer << HIT 1
d0000000-ffffffff : PCI Bus 0000:00
d0000000-d02fffff : PCI Bus 0000:01
d0000000-d01fffff : 0000:01:00.0
d0000000-d01fffff : ath
d0200000-d020ffff : 0000:01:00.0
d0300000-d03fffff : PCI Bus 0000:02
d0300000-d031ffff : 0000:02:00.0
d0300000-d031ffff : igb
d0320000-d0323fff : 0000:02:00.0
d0320000-d0323fff : igb
d0400000-d06fffff : PCI Bus 0000:05
d0400000-d05fffff : 0000:05:00.0
d0400000-d05fffff : ath
d0600000-d060ffff : 0000:05:00.0
d0700000-d07fffff : PCI Bus 0000:03
d0700000-d071ffff : 0000:03:00.0
d0700000-d071ffff : igb
d0720000-d0723fff : 0000:03:00.0
d0720000-d0723fff : igb
d0800000-d08fffff : PCI Bus 0000:04
d0800000-d081ffff : 0000:04:00.0
d0800000-d081ffff : igb
d0820000-d0823fff : 0000:04:00.0
d0820000-d0823fff : igb
d0900000-d09fffff : 0000:00:08.0
d0900000-d09fffff : ccp
d0a00000-d0afffff : 0000:00:08.0
d0a00000-d0afffff : ccp
d0b00000-d0b7ffff : amd_iommu
d0b80000-d0b9ffff : 0000:00:08.0
d0b80000-d0b9ffff : ccp
d0ba0000-d0ba1fff : 0000:00:08.0
d0ba0000-d0ba1fff : ccp
d0ba2000-d0ba3fff : 0000:00:10.0
d0ba2000-d0ba3fff : xhci-hcd
d0ba4000-d0ba4fff : 0000:00:08.0
d0ba4000-d0ba4fff : ccp
d0ba5000-d0ba53ff : 0000:00:11.0
d0ba5000-d0ba53ff : ahci
d0ba6000-d0ba60ff : 0000:00:13.0
d0ba6000-d0ba60ff : ehci_hcd
d0ba7000-d0ba70ff : 0000:00:14.7
d0ba7000-d0ba70ff : mmc0
f8000000-fbffffff : PCI ECAM 0000 [bus 00-3f]
f8000000-fbffffff : pnp 00:00
fec00000-fec003ff : IOAPIC 0
fec10002-fec11001 : pnp 00:01
fec20000-fec203ff : IOAPIC 1
fed00000-fed003ff : HPET 0
fed81500-fed817ff : gpio_amd_fch amd-fch-gpio-iomem
100000000-12effffff : System RAM
12f000000-12fffffff : RAM buffer
I guess they should have used the regions which had been allocated to them, i.e. each WLE uses its ath
region?
The region ce2e4000-cfffffff looks like cbmem (coreboot memory). But I don't understand why and what a device would like to write there.
Getting the logs from cbmem would be great: https://docs.dasharo.com/common-coreboot-docs/dumping_logs/#cbmem-utility
enable_iommus_vapic()
is a virtualisation feature, and unless you're planning to run KVM VMs, you don't need it. The backtrace you get is hitting the WARN_ON() there https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/tree/drivers/iommu/amd/init.c?h=master-next--2024.09.30-1--auto#n2888 which is indicating that VAPIC isn't playing ball.I'm not aware of any extra configuration the firmware would need to do to set up VAPIC, but I wouldn't rule it out either. Either way, I think that's a red herring and unrelated to IO_PAGE_FAULTs.
The code seems to try to disable guest VAPIC logging. However, according to BKDG, the Guest VAPIC should not be supported in the SOC (the GASup bit should be 0 in IOMMU Extended Feature). But, the guest VAPIC log registers are described in BKDG :thinking: The GaLogEn and GaIntEN bits exists in the IOMMU Control register, so maybe if coreboot disable them, Linux will not complain?
I was surprised that VAPIC was seemingly active in APU2; it feels too old to have support. But, it's Fam16h Model 0x30, and I recall there being prototype support there, which was formally supported in Zen1 which was the following architecture.
I agree that the BKDG seems confused on whether vAPIC should be visible or not. I think it's quite likely that there's support in silicon which the AMD BIOS clobbers.
The region ce2e4000-cfffffff looks like cbmem (coreboot memory). But I don't understand why and what a device would like to write there.
Getting the logs from cbmem would be great: https://docs.dasharo.com/common-coreboot-docs/dumping_logs/#cbmem-utility
@miczyg1 Please see IO_PAGE_FAULT
since current boot as follows:
sudo journalctl -k | grep "IO_"
Nov 13 07:20:36 router kernel: ath10k_pci 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xcff49550 flags=0x0070]
Nov 13 07:20:50 router kernel: ath10k_pci 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xcece9e50 flags=0x0070]
Nov 13 07:24:02 router kernel: ath10k_pci 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xcedeb950 flags=0x0070]
Nov 13 07:33:29 router kernel: ath10k_pci 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xcf3d9950 flags=0x0070]
Nov 13 08:44:01 router kernel: ath10k_pci 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xcef9f950 flags=0x0070]
Nov 13 08:45:09 router kernel: ath10k_pci 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xcfec0350 flags=0x0070]
Nov 13 08:51:03 router kernel: ath10k_pci 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xcf9da1d0 flags=0x0070]
Nov 13 08:54:09 router kernel: ath10k_pci 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0011 address=0xceb1fbd0 flags=0x0070]
Nov 13 21:52:52 router kernel: ath10k_pci 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xceec7050 flags=0x0070]
Nov 13 21:52:54 router kernel: ath10k_pci 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xcfb314d0 flags=0x0070]
Nov 13 21:59:35 router kernel: ath10k_pci 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0xcfae8850 flags=0x0070]
Also see /proc/iomem
as follows:
sudo cat /proc/iomem
00000000-00000fff : Reserved
00001000-0009ffff : System RAM
000a0000-000fffff : Reserved
000a0000-000dffff : PCI Bus 0000:00
000f0000-000fffff : System ROM
00100000-ce2e3fff : System RAM
c3800000-c4dfffff : Kernel code
c4e00000-c5bfafff : Kernel rodata
c5c00000-c605673f : Kernel data
c6561000-c69fffff : Kernel bss
ce2e4000-ce2e5fff : ACPI Tables
ce2e6000-ce2e6fff : System RAM
ce2e7000-ce2e9fff : Reserved
ce2ea000-ce2eafff : System RAM
ce2eb000-ce3a5fff : Reserved
ce3a6000-ce98dfff : System RAM
ce98e000-ce9d6fff : Reserved
ce9d7000-cebd8fff : System RAM
cebd9000-cebddfff : ACPI Tables
cebde000-cec0efff : System RAM
cec0f000-cec10fff : ACPI Tables
cec11000-cec13fff : System RAM
cec14000-cec15fff : ACPI Tables
cec16000-cecacfff : System RAM
cecad000-cecadfff : ACPI Tables
cecae000-ced6dfff : System RAM
ced6e000-ced86fff : Reserved
ced87000-cf697fff : System RAM
cf698000-cf698fff : ACPI Tables
cf699000-cf6adfff : System RAM
cf6ae000-cf7edfff : Reserved
cf7ee000-cf7f4fff : System RAM
cf7f5000-cf7f5fff : ACPI Non-volatile Storage
cf7f6000-cf7fdfff : ACPI Tables
cf7fe000-cfc4efff : System RAM
cfc4f000-cfefffff : Reserved
cfc7c000-cfc83fff : BOOT0000:00
cff00000-cfffffff : RAM buffer
d0000000-ffffffff : PCI Bus 0000:00
d0000000-d02fffff : PCI Bus 0000:01
d0000000-d01fffff : 0000:01:00.0
d0000000-d01fffff : ath
d0200000-d020ffff : 0000:01:00.0
d0300000-d03fffff : PCI Bus 0000:02
d0300000-d031ffff : 0000:02:00.0
d0300000-d031ffff : igb
d0320000-d0323fff : 0000:02:00.0
d0320000-d0323fff : igb
d0400000-d06fffff : PCI Bus 0000:05
d0400000-d05fffff : 0000:05:00.0
d0400000-d05fffff : ath
d0600000-d060ffff : 0000:05:00.0
d0700000-d07fffff : PCI Bus 0000:03
d0700000-d071ffff : 0000:03:00.0
d0700000-d071ffff : igb
d0720000-d0723fff : 0000:03:00.0
d0720000-d0723fff : igb
d0800000-d08fffff : PCI Bus 0000:04
d0800000-d081ffff : 0000:04:00.0
d0800000-d081ffff : igb
d0820000-d0823fff : 0000:04:00.0
d0820000-d0823fff : igb
d0900000-d09fffff : 0000:00:08.0
d0900000-d09fffff : ccp
d0a00000-d0afffff : 0000:00:08.0
d0a00000-d0afffff : ccp
d0b00000-d0b7ffff : amd_iommu
d0b80000-d0b9ffff : 0000:00:08.0
d0b80000-d0b9ffff : ccp
d0ba0000-d0ba1fff : 0000:00:08.0
d0ba0000-d0ba1fff : ccp
d0ba2000-d0ba3fff : 0000:00:10.0
d0ba2000-d0ba3fff : xhci-hcd
d0ba4000-d0ba4fff : 0000:00:08.0
d0ba4000-d0ba4fff : ccp
d0ba5000-d0ba53ff : 0000:00:11.0
d0ba5000-d0ba53ff : ahci
d0ba6000-d0ba60ff : 0000:00:13.0
d0ba6000-d0ba60ff : ehci_hcd
d0ba7000-d0ba70ff : 0000:00:14.7
d0ba7000-d0ba70ff : mmc0
f8000000-fbffffff : PCI ECAM 0000 [bus 00-3f]
f8000000-fbffffff : pnp 00:00
fec00000-fec003ff : IOAPIC 0
fec10002-fec11001 : pnp 00:01
fec20000-fec203ff : IOAPIC 1
fed00000-fed003ff : HPET 0
fed81500-fed817ff : gpio_amd_fch amd-fch-gpio-iomem
100000000-12effffff : System RAM
12f000000-12fffffff : RAM buffer
Finally see output from sudo ./cbmem -1 > cbmem.log
attached
Taking one of your addresses at random: 0xcff49550
$ grep 0xcff cbmem.log
[INFO ] add_uma_resource_below_tolm: uma size 0x00100000, memory start 0xcff00000
[DEBUG] Installing permanent SMM handler to 0xcff00000
[DEBUG] HANDLER [0xcfffd000-0xcffffe40]
[DEBUG] ss0 [0xcfffce00-0xcfffd000]
[DEBUG] stub0 [0xcfff5000-0xcfff5198]
[DEBUG] ss1 [0xcfffcc00-0xcfffce00]
[DEBUG] stub1 [0xcfff4e00-0xcfff4f98]
[DEBUG] ss2 [0xcfffca00-0xcfffcc00]
[DEBUG] stub2 [0xcfff4c00-0xcfff4d98]
[DEBUG] ss3 [0xcfffc800-0xcfffca00]
[DEBUG] stub3 [0xcfff4a00-0xcfff4b98]
[DEBUG] stacks [0xcff00000-0xcff02000]
[DEBUG] Loading module at 0xcfffd000 with entry 0xcfffda12. filesize: 0x2d78 memsize: 0x2e40
[DEBUG] Processing 170 relocs. Offset value of 0xcfffd000
[DEBUG] Loading module at 0xcfff5000 with entry 0xcfff5000. filesize: 0x198 memsize: 0x198
[DEBUG] Processing 9 relocs. Offset value of 0xcfff5000
[DEBUG] smm_module_setup_stub: stack_top = 0xcff02000
[DEBUG] SMM Module: stub loaded at cfff5000. Will call 0xcfffda12
So the DMA is hitting the SMM range.
/proc/iomem says cff00000-cfffffff : RAM buffer
Why isn't that marked as reserved in the E820 ?
There are two problems here:
Why isn't that marked as reserved in the E820 ?
Well, there is a bad logic in EDK2 UEFI Payload to determine the TOLUD. We have an ugly hack that worked for Intel (as we released only firmware for intel-based boards) and simply read the TOLUD from host bridge. It goes without saying that on AMD it doesn't work :) So the TOLUD is assumed to be on the MMIO boundary (0xd0000000 in this case) instead of 0xcff00000, so the memory map if ends up not reserving the 0xcff00000-0xd0000000 RAM... Working on a fix.
I have found out that AVIC is not supported on this HW. The CPUID 0x8000000A EDX indicates no support for AVIC. That means the IOMMU guest AVIC support should not be exposed at all. I have prepared a fix for it already by hiding the guest AVIC capability in IOMMU and disabling the feature. So far it works and no more WARNs is visible in dmesg.
12f000000-12fffffff : RAM buffer
range looks also bad. It is C6 save state RAM, which should also be reserved.
Do I need that also for Dasharo (coreboot+SeaBIOS), or is this UEFI-specific?
I would happily test the new build 😀
Do I need that also for Dasharo (coreboot+SeaBIOS), or is this UEFI-specific?
@pietrushnic
I would be happy to introduce a fix to the upcoming 24.08.00.01. I'm unsure about testing, although it would be great to have automated verification of this issue to avoid regression. Still, maybe IOMMU verification could be extended as part of additional effort since I think this is quite a lot of effort.
:thinking: I guess I cannot assign issue to two milestones.
Ok, created separate issue for tracking Dasharo (coreboot+SeaBIOS) fix integration.
These IO_PAGE_FAULTs happen every now and then, so far they seem sporadic to me. In general I experience quite good WiFi performance, but sometimes I experience weird/significant delays, maybe the issue is related.
By another round of trial and error I have put the IOMMU into pass-through mode (whatever it means) using the kernel option
iommu=pt
(see also [3]). This seems to have fixed theIO_PAGE_FAULT
-- seems like they are gone.
By the way: at least, disabling the IOMMU has fixed my WiFi issues I mentioned originally.
Component
Dasharo firmware
Device
PC Engines APU2
Dasharo version
pcengines_apu2_v0.9.1-rc1
Dasharo Tools Suite version
No response
Test case ID
No response
Brief summary
Linux kernel reports IO_PAGE_FAULTS on writes by ath10k_pci
How reproducible
Hi dasharo team,
I am running dasharo's
v0.9.1-rc1
on a PC Engines APU2D4 with Ubuntu 24.04 LTS (currently kernel6.8.0-48-generic
).The APU is equipped with two Compex WLE900VX, thus using
ath10k_pci
driver (plus currentlylinux-firmware 20240318.git3b128b60-0ubuntu2.4
). Both are in AP-mode (using hostapd), one is on 2.4Ghz and one on 5Ghz.ath10k_pci
reports the WLEs' Qualcomm chips properly (qca988x hw2.0 target
, see also [1]) and loads the latest firmware properly (firmware ver 10.2.4-1.0-00047
, see also [2]).The WLE on 5Ghz reports the following IO_PAGE_FAULTS:
As far as I understand, according to AMD's IOMMU specifications, the
flags=0x0070
indicate that the WLE has been lacking permission when trying to write to the addresses reported (see also [3]).These IO_PAGE_FAULTs happen every now and then, so far they seem sporadic to me. In general I experience quite good WiFi performance, but sometimes I experience weird/significant delays, maybe the issue is related.
The issue in [4] might be related. Yet, disabling dasharo's performance boost option for the APU in BIOS did not change the issue. And, the workaround of emulating the IOMMU hardware in software (kernel parameter
iommu=soft
) does not seem to be an option to me for performance reasons (have not tested it).I'd appreciate any ideas. Thank you very much.
[1] https://compex.com.sg/shop/wifi-module/802-11ac-wave-1/wle900vx-wifi5-11ac-qca9880-qca9890/ [2] https://git.codelinaro.org/clo/ath-firmware/ath10k-firmware/-/tree/main/QCA988X/hw2.0/10.2.4-1.0?ref_type=heads [3] https://groups.google.com/g/linux-ntb/c/vvnbizy8d_8/m/tZMqnJH9AwAJ [4] https://github.com/pcengines/apu2-documentation/issues/240
How to reproduce
n/a
Expected behavior
n/a
Actual behavior
n/a
Screenshots
No response
Additional context
No response
Solutions you've tried
No response