Dasharo / dasharo-issues

The Dasharo issue tracker
https://dasharo.com/
25 stars 0 forks source link

asus_kgpe-d16 can not boot with a GPU connected #48

Open mrothfuss opened 2 years ago

mrothfuss commented 2 years ago

Dasharo version asus_kgpe-d16_v0.3.0

Dasharo variant ASUS KGPE-D16

Affected component(s) or functionality The board does not boot correctly with a GPU connected

Brief summary Using either the internal ASpeed GPU or a dedicated GPU (ie: ATI), the system does not boot.

How reproducible Very, every boot has failed.

How to reproduce

Steps to reproduce the behavior:

  1. Enable the onboard VGA card or install a PCIe GPU
  2. Turn on the machine

Expected behavior The machine boots and provides video output

Actual behavior The machine boots (coreboot -> SeaBIOS), displays distorted pixels on the screen, and fails to boot an operating system

Solutions you've tried I have tried all combinations of using the ASpeed VGA Textbuffer, the VGA BIOS extracted from a stock KGPE-D16 BIOS, the generic VGA BIOS provided by SeaBIOS, and loading the dedicated GPU's option rom by either coreboot or SeaBIOS. All have failed to boot an operating system. All of these configurations were working in the coreboot 4.11 version.

miczyg1 commented 2 years ago

Additional coreboot logs for both situations (working and non-working) would be helpful

pietrushnic commented 2 years ago

@miczyg1 what is the easiest way to gather that? Does cbmem work in this case? Maybo some documentation about log gathering or information in issue template would be helpful.

macpijan commented 2 years ago

@mrothfuss Thanks for the report. It is true that the setup we have right now does not have a GPU connected.

Maybe we could extend one of the setups with GPU. Do you have any suggestions on the device? Which one you used?

mrothfuss commented 2 years ago

To clarify, these are the hardware configurations I tested

Onboard ASpeed VGA Enabled + ATI RX 550: did not boot
Onboard ASpeed VGA Disabled + ATI RX 550: did not boot
Onboard ASpeed VGA Enabled + No PCIe GPU: did not boot
Onboard ASpeed VGA Disabled + No PCIe GPU: boots (headless. system is more stable than coreboot 4.11 but can still crash)

The Onboard ASpeed VGA was enabled/disabled using mainboard jumpers.

For the "Onboard ASpeed VGA Enabled + No PCIe GPU" scenario, I tried different build VGA settings (native coreboot init / SeaVGABIOS / proprietary VGA BIOS), all failed to produce a successful boot. All crashed with either blank screens or random pixels on the VGA output. Each of these three VGA settings worked in coreboot 4.11. The ATI RX 550 also worked in coreboot-4.11 having SeaBIOS execute it's option ROM and using natively, or skipping initialization and using the device in a guest VM.

My suggestion/request would be a consumer GPU that is compatible with full PCIe passthrough. The ATI RX 550 works for this. I have used one with the Proprietary KGPE-D16 / Debian-Xen passthrough previously and Coreboot-4.11 KGPE-D16 / Debian-KVM passthrough currently. The only caveat (for both Prop / Coreboot BIOS) is that restarting the device after passthrough is problematic, but this is a device/virtualization issue -- not BIOS related.

There is an incomplete list of such cards here: https://en.wikipedia.org/wiki/List_of_IOMMU-supporting_hardware#Tested_graphics_card

mrothfuss commented 2 years ago

Here are logs for comparison

Stable boot, all GPUs disabled/removed https://github.com/mrothfuss/coreboot-logs/blob/master/166_gpudisabled

Failed boot, Onboard ASpeed VGA Enabled / No PCIe GPU https://github.com/mrothfuss/coreboot-logs/blob/master/171_aspeed_textmode

note: the crash/freeze occurs after the system tries to boot an operating system and does not show up in the log. The first scenario boots into a headless/stable system available by SSH. The second scenario does not.

mrothfuss commented 2 years ago

More information to help track down the bug:

I tried disabling the ASpeed driver and booting with the onboard GPU enabled, the system still failed to boot an operating system.

I've also tried switching the payload from SeaBIOS to GRUB2 and experienced the same problem.

miczyg1 commented 2 years ago

There is a lot of combination which GPU to use and which OptionROMs can be loaded. So one thing is to test the settings first. Secondly debug any issues related to GPU or onboard VGA.

mrothfuss commented 2 years ago

My FreeBSD system seems to crash when the bootloader interacts with video cards in dasharo builds. @miczyg1 might be able to boot a different OS successfully, but with broken video.

Testing the patch in https://github.com/Dasharo/coreboot/pull/123 got video working until I hit the FreeBSD bootloader, then the system crashed as before.

I've collected logs from working coreboot-4.11 builds with different configurations.

Using the stock VGA BIOS from Asus : working display, bootsplash rendered

Using just the AST Text Mode driver : working display, text mode only

Using AST Text mode with SeaVGABIOS : working display, text mode only

Using AST Text mode with SeaVGABIOS, video card disabled by jumpers : headless boot

These logs were collected on the same board/hardware that was used in the previously shared logs.

miczyg1 commented 2 years ago

Interesting is the fact that the memory clock in ASPEED is different between those logs: ast_driver_load: dram 320571000 0 32 00800000 - textmode SeaVGABIOS and ASUS VGA BIOS ast_driver_load: dram 552000000 0 32 00800000 - textmode only

Also the 2nd CPU being present may make a big difference between our setups (additionally we have Opterons 6200 series).. We will check the FreeBSD in the meanwhile we wait for the fan to be delivered (to be able to install 2nd CPU).

mrothfuss commented 2 years ago

Not sure if these differences between cb4.11 and dasharo might be problematic

# coreboot-4.11
PCI: 00:18.0 111b8 <- [0x00000a0000 - 0x00000bffff] size 0x00020000 gran 0x00 mem <node 0 link 1>
PCI: 00:18.0 110b0 <- [0x00fce00000 - 0x00fcefffff] size 0x00100000 gran 0x14 prefmem <node 0 link 1>
PCI: 00:18.0 110b8 <- [0x00fc000000 - 0x00fcdfffff] size 0x00e00000 gran 0x14 mem <node 0 link 1>
PCI: 00:18.0 110d8 <- [0x0000001000 - 0x0000005fff] size 0x00005000 gran 0x0c io <node 0 link 1>
# dasharo
PCI: 00:18.0 88 <- [0x4040000000 - 0x40afffffff] size 0x70000000 gran 0x14 prefmem <node 0 link 1>
PCI: 00:18.0 90 <- [0x00d4000000 - 0x00d85fffff] size 0x04600000 gran 0x14 mem <node 0 link 1>
PCI: 00:18.0 98 <- [0x00fec20000 - 0x00fec2ffff] size 0x00001000 gran 0x10 mem <node 0 link 1>
PCI: 00:18.0 c0 <- [0x0000001000 - 0x000000cfff] size 0x0000c000 gran 0x0c io <node 0 link 1>
miczyg1 commented 2 years ago

I have checked those multiple times. The VGA region 0xa0000 - 0xbffff doesn't have to be stored into the mapping registers as shown on the logs. From KBDG: If the access matches the VGA-compatible MMIO address space and D18F1xF4[VE]=1 then D18F1xF4 describes how the access is routed and controlled;. This register is set to route the specified VGA address range as MMIO. So everything should be well configured in this regard.

krystian-hebel commented 2 years ago

@mrothfuss https://github.com/Dasharo/coreboot/pull/190 should fix it, feel free to test it. The issue was caused by hidden dependencies on order of execution in resource allocator v3.

pietrushnic commented 2 years ago

@krystian-hebel really good news. Thank you.

krystian-hebel commented 2 years ago

Binaries produced from above code: https://cloud.3mdeb.com/index.php/s/tJxzjem6kMaiZkR, config (+/- flash size): coreboot_vga_defconfig.txt. It defaults to external GPU, if installed. Apparently without dGPU SeaBIOS isn't able to print, although it worked in my earlier tests - either some code was not cleaned properly or ASpeed retained some internal state between tests, this needs further investigation.

mrothfuss commented 2 years ago

I tested the 16Mb ROM you shared and experienced the same issue (display was scrambled with random blocks of pixels).

Hardware Setup: KGPE-D16 rev. 1.03G 2xOpteron 6386 8xSuper Talent W13RB16G4S Radeon HD 3470 PIKE2008 (IT Mode) ASpeed disabled via jumper

This setup was tested on my build of coreboot 4.11 and worked fine. I'll grab another board closer to your testing setup to test too.

mrothfuss commented 2 years ago

Another setup: KGPE-D16 rev. 1.03G 1xOpteron 6282 1xSuper Talent W13RB16G4S Radeon HD 3470 ASpeed disabled via jumper

Display was broken, mostly black with a large white rectangle in the middle. FreeBSD was able to boot blind though (an improvement). This setup also worked fine in coreboot 4.11.

ybh1998 commented 2 years ago

Hi! I also have a KGPE-D16 previously using libreboot. I have tested the released asus_kgpe-d16_v0.2.0 and the binary in the link above using my platform. The result is a little different. My configuration is as follows:

MB: KGPE-D16 rev 1.04 CPU: 2 x Opteron 6282 GPU: 2 x Radeon 6700xt, ASpeed jumper pin is untouched and enabled by default

With v0.2.0 firmware and ASpeed only: The onboard VGA outputs random blocks. After Linux boots, it shows Linux console correctly.

With v0.2.0 firmware and one/two Radeon 6700xt (ASpeed enabled): The onboard VGA outputs random blocks. After Linux boots, it still shows random blocks. The HDMI on only one 6700xt shows Linux console correctly.

With vga_fix firmware and ASpeed only: The onboard VGA outputs only one line "ASpeed VGA text mode initialized". After Linux boots, it shows Linux console correctly.

With vga_fix firmware and one Radeon 6700xt (ASpeed enabled): The onboard VGA outputs nothing. The HDMI on 6700xt shows SeaBIOS and the following boot process. After Linux boots, the onboard VGA starts. The Linux console works on one of them (distro-related) correctly.

With vga_fix firmware and two Radeon 6700xt (ASpeed enabled): The onboard VGA outputs nothing. The HDMI on only one 6700xt shows SeaBIOS and the following boot process. After Linux boots, the onboard VGA starts. The Linux console works on one of them (distro-related) correctly.

It seems that vga_fix fixes the ASpeed random blocks problem on my platform, but the following boot process is not correctly directed to onboard VGA.

Many thanks to the developer for this promising update! I can help with some further testing and I really hope to see this bugfix release soon!

krystian-hebel commented 2 years ago

@ybh1998 thanks for the report. It is mostly consistent with our tests, though we haven't tested 2x dGPU case.

Whether coreboot and SeaBIOS output is directed to VGA or external card is configurable, in attached binaries it is set to pass it to the GPU installed at the highest PCIe bus number.

After Linux boots with Radeon installed, is anything displayed on VGA? In our case, for some reason Linux decides to move still copy of SeaBIOS output there.

Issue with no SeaBIOS output on VGA is still present, although I've seen it work at some point. It is possible that a debug output that was later removed inadvertently introduced a delay where it was needed.

ybh1998 commented 2 years ago

I think the behavior after Linux boot might be distro-related, so I add my OS configuration here. The above tests use Debian 11 live disk debian-live-11.3.0-amd64-standard.iso.

I brought two monitors to verify the tests again. After Linux boot, the onboard VGA port starts up, and the default console is actually on the onboard VGA port. The Radeon HDMI port shows some early-stage Linux kernel output (Updated in the above post).

I also tried Arch Linux install image archlinux-2022.06.01-x86_64.iso. The default console is on Radeon HDMI port with 4K resolution. The onboard VGA port shows some early-stage Linux kernel output.

So, I think both of the ports are working correctly on my platform with the vga_fix firmware.

mrothfuss commented 2 years ago

Good news for the issue. Merging patches from both aspeed_vga and vga_fix into develop seems to fix the issue, both onboard ASpeed GPU and a PCIe dGPU seem to be working.

Update: I was able to boot using an RX550 as the primary display card, wayland works too.

Everything worked using the Onboard VGA + ASUS VGA BIOS. Dedicated graphics did not display a bootsplash (SeaBIOS + Option ROM), two ATI cards tested.

krystian-hebel commented 2 years ago

@mrothfuss I've pushed some of missing changes to vga_fix, care to test?

mrothfuss commented 2 years ago

@krystian-hebel Sure. This didn't seem to change anything. Here's the SeaBIOS output for the dGPU failed bootsplash. (still works fine on ASpeed + VGABIOS)

Scan for option roms

Press ESC for boot menu.

get_mode failed.
get_mode failed.
get_mode failed.
(repeated)
get_mode failed.
Unable to find vesa video mode dimensions 640/480
failed to find a videomode with 640x480 0bpp (0=any).

There were some merge conflicts between the aspeed_vga and vga_fix branches

You can see how I merged them here. Nothing dramatic.

krystian-hebel commented 2 years ago

@mrothfuss sorry for the delay, busy couple of weeks. I wasn't precise enough, I meant to use vga_fix only, without aspeed_vga. I've ported just parts of the latter branch that in my opinion should be enough to fix the issue, without touching unrelated stuff. I'd be grateful for another test, but no rush, I probably won't be able to do anything with the results until the end of week.

mrothfuss commented 2 years ago

@krystian-hebel I built vga_fix and tested it.

SeaBIOS with Aspeed + VGABIOS: boots, SeaBIOS shows bootsplash, FreeBSD boots correctly

SeaBIOS with Radeon + Option ROM: boots, display shows random pixels, FreeBSD boots blind (available by ssh)

bcHelix commented 2 years ago

I tested this and got the "random pixels" as well.