geerlingguy / raspberry-pi-pcie-devices

Raspberry Pi PCI Express device compatibility database
http://pipci.jeffgeerling.com
GNU General Public License v3.0
1.52k stars 135 forks source link

Test GPU (VisionTek Radeon 5450 1GB) #4

Open geerlingguy opened 3 years ago

geerlingguy commented 3 years ago

I want to see if an AMD card works out of the box with the drivers built into Linux, as everyone on the Internet seems to say. For X86 Linux, that definitely seems to be the case, but will it work on ARM32? ARM64?

I settled on the Radeon HD 5450 1GB PCIe 2.1 card, mostly because it's available at a local retailer for $35.

Phoronix did a pretty extensive article on this board, and while it's no screamer... or even that fast... it is a simple, fanless, low-power board, and that might just be perfect for the Pi CM4. Here's that article: ATI Radeon HD 5450 On Linux.

I don't expect it to be fast, or amazing, but I do expect to get it to work. Maybe.

DSC_2290

Related links:

dtischler commented 3 years ago

AMD cards do work on Arm64...but...that is on machines with proper UEFI+ACPI and enough BAR. So, not apples to apples here. Just saying that the drivers are in fact functional.

geerlingguy commented 3 years ago

@dtischler - Very good to know! I'm heading out to pick up the card from Micro Center in a little bit... fingers crossed this experience is a little easier :)

geerlingguy commented 3 years ago
$ sudo lspci -v
00:00.0 PCI bridge: Broadcom Limited Device 2711 (rev 20) (prog-if 00 [Normal decode])
    Flags: fast devsel
    Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
    I/O behind bridge: 00000000-00000fff
    Memory behind bridge: f8000000-f80fffff
    Capabilities: [48] Power Management version 3
    Capabilities: [ac] Express Root Port (Slot-), MSI 00
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [180] Vendor Specific Information: ID=0000 Rev=0 Len=028 <?>
    Capabilities: [240] L1 PM Substates

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cedar [Radeon HD 5000/6000/7350/8350 Series] (prog-if 00 [VGA controller])
    Subsystem: VISIONTEK Cedar [Radeon HD 5000/6000/7350/8350 Series]
    Flags: fast devsel, IRQ 255
    Memory at <unassigned> (64-bit, prefetchable) [disabled]
    Memory at 600000000 (64-bit, non-prefetchable) [disabled] [size=128K]
    I/O ports at <unassigned> [disabled]
    [virtual] Expansion ROM at 600020000 [disabled] [size=128K]
    Capabilities: [50] Power Management version 3
    Capabilities: [58] Express Legacy Endpoint, MSI 00
    Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
    Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
    Capabilities: [150] Advanced Error Reporting

01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI Audio [Radeon HD 5400/6300/7300 Series]
    Subsystem: VISIONTEK Cedar HDMI Audio [Radeon HD 5400/6300/7300 Series]
    Flags: fast devsel, IRQ 255
    Memory at 600040000 (64-bit, non-prefetchable) [disabled] [size=16K]
    Capabilities: [50] Power Management version 3
    Capabilities: [58] Express Legacy Endpoint, MSI 00
    Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
    Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
    Capabilities: [150] Advanced Error Reporting
geerlingguy commented 3 years ago

Same BAR address space issue as the Zotac GeForce GT 710:

[    1.047548] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    1.047591] pci 0000:00:00.0: BAR 9: no space for [mem size 0x10000000 64bit pref]
[    1.047604] pci 0000:00:00.0: BAR 9: failed to assign [mem size 0x10000000 64bit pref]
[    1.047618] pci 0000:00:00.0: BAR 8: assigned [mem 0x600000000-0x6000fffff]
[    1.047638] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 64bit pref]
[    1.047650] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 64bit pref]
[    1.047664] pci 0000:01:00.0: BAR 2: assigned [mem 0x600000000-0x60001ffff 64bit]
[    1.047701] pci 0000:01:00.0: BAR 6: assigned [mem 0x600020000-0x60003ffff pref]
[    1.047716] pci 0000:01:00.1: BAR 0: assigned [mem 0x600040000-0x600043fff 64bit]
[    1.047749] pci 0000:01:00.0: BAR 4: no space for [io  size 0x0100]
[    1.047760] pci 0000:01:00.0: BAR 4: failed to assign [io  size 0x0100]
[    1.047774] pci 0000:00:00.0: PCI bridge to [bus 01]
[    1.047793] pci 0000:00:00.0:   bridge window [mem 0x600000000-0x6000fffff]
[    1.047897] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0

So I'm going to follow the same process for increasing the BAR addressable memory range (see step 1 here: https://gist.github.com/geerlingguy/9f1510ab028e68b712381520308db2af).

geerlingguy commented 3 years ago

Well that's weird. I extended the address range and am still gettting:

[    1.006265] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
[    1.006284] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
[    1.006343] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x060fffffff -> 0x00e0000000
[    1.006398] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x00ffffffff -> 0x0100000000
[    1.023282] brcm-pcie fd500000.pcie: link up, 2.5 GT/s x1 (SSC)
[    1.023571] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
[    1.023586] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.023603] pci_bus 0000:00: root bus resource [mem 0x600000000-0x60fffffff] (bus address [0xe0000000-0xefffffff])
[    1.023655] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
[    1.023874] pci 0000:00:00.0: PME# supported from D0 D3hot
[    1.027287] pci 0000:00:00.0: bridge configuration invalid ([bus ff-ff]), reconfiguring
[    1.027475] pci 0000:01:00.0: [1002:68f9] type 00 class 0x030000
[    1.027563] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x0fffffff 64bit pref]
[    1.027603] pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x0001ffff 64bit]
[    1.027630] pci 0000:01:00.0: reg 0x20: [io  0x0000-0x00ff]
[    1.027673] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0001ffff pref]
[    1.027702] pci 0000:01:00.0: enabling Extended Tags
[    1.027858] pci 0000:01:00.0: supports D1 D2
[    1.027913] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x1 link at 0000:00:00.0 (capable of 32.000 Gb/s with 2.5 GT/s x16 link)
[    1.028051] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    1.028125] pci 0000:01:00.1: [1002:aa68] type 00 class 0x040300
[    1.028209] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[    1.028315] pci 0000:01:00.1: enabling Extended Tags
[    1.028470] pci 0000:01:00.1: supports D1 D2
[    1.031733] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    1.031776] pci 0000:00:00.0: BAR 9: assigned [mem 0x600000000-0x60fffffff 64bit pref]
[    1.031791] pci 0000:00:00.0: BAR 8: no space for [mem size 0x00100000]
[    1.031803] pci 0000:00:00.0: BAR 8: failed to assign [mem size 0x00100000]
[    1.031823] pci 0000:01:00.0: BAR 0: assigned [mem 0x600000000-0x60fffffff 64bit pref]
[    1.031858] pci 0000:01:00.0: BAR 2: no space for [mem size 0x00020000 64bit]
[    1.031870] pci 0000:01:00.0: BAR 2: failed to assign [mem size 0x00020000 64bit]
[    1.031883] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
[    1.031894] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]
[    1.031907] pci 0000:01:00.1: BAR 0: no space for [mem size 0x00004000 64bit]
[    1.031918] pci 0000:01:00.1: BAR 0: failed to assign [mem size 0x00004000 64bit]
[    1.031929] pci 0000:01:00.0: BAR 4: no space for [io  size 0x0100]
[    1.031940] pci 0000:01:00.0: BAR 4: failed to assign [io  size 0x0100]
[    1.031953] pci 0000:00:00.0: PCI bridge to [bus 01]
[    1.031978] pci 0000:00:00.0:   bridge window [mem 0x600000000-0x60fffffff 64bit pref]
[    1.032077] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0

I'm also seeing a lot of these messages:

[   12.014589] broken atomic modeset userspace detected, disabling atomic
[   12.527461] broken atomic modeset userspace detected, disabling atomic
[   13.463488] broken atomic modeset userspace detected, disabling atomic
[   13.980834] broken atomic modeset userspace detected, disabling atomic

I also have to wonder if there are any issues with the super cheap x16 to x1 adapter cable I'm using :/ — I might have to let 'red shirt Jeff' have a go at hacking out the side of the PCIe x1 slot...

geerlingguy commented 3 years ago

I was also looking into RadeonOpenCompute, but it seems Debian 10 is not a supported OS: https://rocmdocs.amd.com/en/latest/Current_Release_Notes/Current-Release-Notes.html#list-of-supported-operating-systems

geerlingguy commented 3 years ago

I'm going to re-flash 32-bit Pi OS and see if that makes any difference.

dtischler commented 3 years ago

Before taking a saw to the x1 slot, you could try a SATA, Network, or other PCIe card in the x16 slot, and see if that enumerates and functions properly, just to confirm the adapter works. If not, then, reach for the saw.

geerlingguy commented 3 years ago

@dtischler - Good point. Will test another card.

Edit: Other card worked fine, mounted a USB 3.0 drive:

pi@raspberrypi:~ $ lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
    |__ Port 4: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
geerlingguy commented 3 years ago

Same thing under 32-bit Pi OS, unfortunately:

[    0.955978] pci 0000:00:00.0: BAR 9: assigned [mem 0x600000000-0x60fffffff 64bit pref]
[    0.955997] pci 0000:00:00.0: BAR 8: no space for [mem size 0x00100000]
[    0.956013] pci 0000:00:00.0: BAR 8: failed to assign [mem size 0x00100000]
[    0.956039] pci 0000:01:00.0: BAR 0: assigned [mem 0x600000000-0x60fffffff 64bit pref]
[    0.956084] pci 0000:01:00.0: BAR 2: no space for [mem size 0x00020000 64bit]
[    0.956100] pci 0000:01:00.0: BAR 2: failed to assign [mem size 0x00020000 64bit]
[    0.956118] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
[    0.956134] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]
[    0.956151] pci 0000:01:00.1: BAR 0: no space for [mem size 0x00004000 64bit]
[    0.956167] pci 0000:01:00.1: BAR 0: failed to assign [mem size 0x00004000 64bit]
[    0.956183] pci 0000:01:00.0: BAR 4: no space for [io  size 0x0100]
[    0.956200] pci 0000:01:00.0: BAR 4: failed to assign [io  size 0x0100]
elFarto commented 3 years ago

It looks like you're running out of BAR space again. BAR 9 gets allocated all the BAR space available, which is why the rest fail (well, apart from the IO one).

geerlingguy commented 3 years ago

@elFarto - I just noted over in the Pi Forums issue that expanding the space to 1 GB seems to have fixed that issue:

ranges = <0x02000000 0x0 0xc0000000 0x6 0x00000000 0x0 0x40000000>;

However, I'm now realizing that just like the nouveau driver, it seems the amd drm drivers are nowhere in sight on Raspberry Pi OS (find /lib/modules/$(uname -r) -type f -name '*.ko*' | grep amd reveals nothing).

So I'm going to try to rebuild the kernel with those modules, on 32-bit Pi OS (I tried earlier on 64-bit, but must've missed something because I couldn't get nouveau to work). Fingers crossed!

geerlingguy commented 3 years ago

Process for building and configuring the kernel:

# Install dependencies
sudo apt install -y git bc bison flex libssl-dev make

# Clone source
git clone --depth=1 https://github.com/raspberrypi/linux

# Apply default configuration
cd linux
export KERNEL=kernel7l # use kernel8 for 64-bit, or kernel7l for 32-bit
make bcm2711_defconfig

# Customize the .config further with menuconfig
sudo apt install -y libncurses5-dev
make menuconfig
# (search for /radeon (or /amdgpu for newer cards), enable in the proper section, save, then exit)
nano .config
# (edit CONFIG_LOCALVERSION and add a suffix that helps you identify your build)

# Build the kernel and copy everything into place
make -j4 zImage modules dtbs # 'Image' on 64-bit
sudo make modules_install
sudo cp arch/arm/boot/dts/*.dtb /boot/
sudo cp arch/arm/boot/dts/overlays/*.dtb* /boot/overlays/
sudo cp arch/arm/boot/dts/overlays/README /boot/overlays/
sudo cp arch/arm/boot/zImage /boot/$KERNEL.img

Then redo the changes in the Gist (https://gist.github.com/geerlingguy/9f1510ab028e68b712381520308db2af) for the BAR address expansion and reboot.

Also, the compile took almost 1 hour:

real    56m41.742s
user    190m59.161s
sys 26m59.542s
geerlingguy commented 3 years ago

I'm now seeing the amdgpu module:

pi@raspberrypi:~ $ sudo modinfo amdgpu
filename:       /lib/modules/5.4.72-v7l-jjggpu+/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko
license:        GPL and additional rights
description:    AMD GPU
author:         AMD linux driver team
...

Then:

$ sudo modprobe amdgpu
$ dmesg | grep amdgpu
[  131.521279] [drm] amdgpu kernel modesetting enabled.
$ lsmod
Module                  Size  Used by
amdgpu               2830336  0
...

Also noting that to enable a module on boot, add it to a new line in /etc/modules.

geerlingguy commented 3 years ago

Drat! I think the amdgpu driver only works with newer Radeons ('Southern Islands' and newer).

Might have to re-recompile the kernel with the radeon driver, which according to this, supports the 'Evergreen' generation, which is the family the 5450 belongs to: https://www.x.org/wiki/RadeonFeature/ (more info on the Arch wiki: https://wiki.archlinux.org/index.php/AMDGPU).

Should've probably done that from the start, oops.

dtischler commented 3 years ago

Doh - I could have warned you of that, I was not paying close enough attention. You are correct though, AMDGPU would be for newer cards.

geerlingguy commented 3 years ago

Well, it's no biggie, I was just assuming that "the past 20 years" was modern here, where it's really just the past 5-10 years.

Interestingly, since I had also compiled nouveau (but it's not enabled), I am now finding that with my custom kernel, the whole Pi locks up during boot if I have the Nvidia card plugged in—it gets past the initial boot, the HDMI0 display turns on to a flashing cursor, then the cursor stops flashing and 'poof', it's locked up. No SSH access, no keyboard access. Weird. That doesn't happen with the Radeon plugged in.

geerlingguy commented 3 years ago

Rebuilding the kernel the 4th time today, this time choosing the radeon driver and not amdgpu...

geerlingguy commented 3 years ago

Well this is weird... lspci is showing nothing after recompiling, and I see:

[    0.865707] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
[    0.865731] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
[    0.865803] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x063fffffff -> 0x00c0000000
[    0.865875] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x00ffffffff -> 0x0100000000
[    1.464096] brcm-pcie fd500000.pcie: link down
...
[    5.292508] vc4-drm gpu: HDMI-A-2: EDID is invalid:
[    5.292530]  [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    5.292544]  [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    5.292557]  [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    5.292570]  [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    5.292583]  [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    5.292595]  [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    5.292608]  [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    5.292621]  [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Rebooting again. I have not yet enabled the radeon kernel module.

geerlingguy commented 3 years ago

I keep getting that [ 1.464096] brcm-pcie fd500000.pcie: link down after every reboot... switching back to a known-good microSD to make sure I haven't done something bad to the hardware.

[Edit: Good. Other microSD shows lspci with Radeon info.]

geerlingguy commented 3 years ago

After re-allocating 1GB of memory to the PCIe bus again, the device is getting recognized again (phew!), so back to recompiling the kernel the 6th time today.

geerlingguy commented 3 years ago

Sooooo close:

Darn, so close:

$ dmesg | grep radeon
[    0.000000] Linux version 5.4.72-v7l-radeon+ (pi@raspberrypi) (gcc version 8.3.0 (Raspbian 8.3.0-6+rpi1)) #1 SMP Fri Oct 23 22:10:38 BST 2020
[    4.737476] [drm] radeon kernel modesetting enabled.
[    4.737774] radeon 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 0: 0x0 -> 0xfffffff
[    4.737793] radeon 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 2: 0x10000000 -> 0x1001ffff
[    4.737852] radeon 0000:01:00.0: enabling device (0140 -> 0142)
[    4.743219] [drm:radeon_device_init [radeon]] *ERROR* Unable to find PCI I/O BAR
[    4.896672] radeon 0000:01:00.0: Expecting atombios for evergreen GPU
[    4.896692] radeon 0000:01:00.0: Fatal error during GPU init
[    4.896708] [drm] radeon: finishing device.
[    4.991240] radeon: probe of 0000:01:00.0 failed with error -22

It looks like the radeon driver requires the presence of the I/O BAR. And recall from earlier that it's not being allocated:

[    0.905439] pci 0000:01:00.0: BAR 4: no space for [io  size 0x0100]
[    0.905454] pci 0000:01:00.0: BAR 4: failed to assign [io  size 0x0100]

Maybe that's also what's happening with the Nvidia proprietary driver, and it's just not giving a helpful error message about that?

geerlingguy commented 3 years ago

I think we may be out of luck, in this case, as @pelwell mentions that the spec sheet for the BCM2711 says:

Supports accessing external PCIe configuration space and memory space (no support for I/O space).

@trejan on the Pi Forum also said this:

Digging around, it looks like Nvidia have been reusing the same basic design for this for several generations now. That IOBAR is used for the legacy VGA I/O ports and also BIOS configuration. The x86 CPU is still in real mode at that point so can't access the other memory ranges. The Radeon Open Compute documentation also mentions that their cards have an IOBAR and it used for the same reasons but say it isn't needed if the card is operating headless for compute.

Not sure what the Nvidia closed source driver is doing though. Something not initialised properly or is it trying to access an I/O port? The Tegra PCIe controller apparently does support IOBARs according the source code.

I think we can say conclusively that this Radeon board will not work, though I can't say that conclusively about the Nvidia board. If the IO bar is used only for VGA, it would be nice if a driver could ignore it's absence, and maybe Nvidia's driver could at some point...

I wonder if there are any video cards that are not massively expensive (and likely to need too many resources to work with the Pi) that do not require an I/O BAR?

geerlingguy commented 3 years ago

Some good reading here: PCI BARs and other means of accessing the GPU

scarburato commented 3 years ago

The PCI I/O BAR error should not be lethal as it doesn't cause the probe to fail (see https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/radeon/radeon_device.c#L1427 ) it just print the error and then continues

The probe fails with error 22 which is Invalid argument, which seems to be generated here ( https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/radeon/evergreen.c#L5186 you can also see the string "expecting atombios for evergreen gpu" being printed). The last error is then printed here ( https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/radeon/radeon_kms.c#L136 )

I'm not exactly sure what is this atombios if it's something on the gpu or on (x86, amd64) cpus or with missing IO BARs... anyway here ( https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/radeon/radeon_bios.c#L708 ) is where the driver seems to its thing to check this atom bios presence

Not exactly sure if this might be helpful but those are my 2 cent from a rapid look to the Kernel code

geerlingguy commented 3 years ago

Related: https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/6

Also, I found out more clues through the grapevine today:

You need to find out if the platform supports a cache coherent PCIe interface. The PCI spec requires it, but not all platforms (especially non-x86 platforms) support that. ARM has some IP you need to include for that to work correctly. Also, most non-x86 platforms seems to have problems with writecombining. They may need to patch drm_arch_can_wc_memory() for their platform.

For more background on that last problem, I found https://patchwork.kernel.org/project/dri-devel/patch/20181220145657.304-1-alexander.deucher@amd.com/

geerlingguy commented 3 years ago

Further:

tl;dr - It's not the board, it's the old card.

Looks like the driver is unable to access the PCI ROM which is required to fetch the vbios image. The driver needs the vbios image to get board specific details (display topology, clocks, voltages, etc.). You'd probably have better luck on a newer board supported by amdgpu.

The radeon driver requires that the PCI ROM BAR be accessible to fetch the vbios image, but on the amdgpu driver, we can read back the rom via MMIO registers.

So it seems like #6 might be the way to go, and might offer a more fruitful result. Thanks especially to Djhg2000 for the recommendation on YouTube, looks like he was on to something!

geerlingguy commented 3 years ago

Over in the raspberrypi/linux project, it looks like this commit (https://github.com/raspberrypi/linux/commit/54db4b2fa4d17251c2f6e639f849b27c3b553939) has increased the default BAR allocation to 1GB by default—nice!

geerlingguy commented 3 years ago

unrelenting.technology over on my blog posted:

Very interesting that the BAR space is configurable on the Broadcrap SoC.

You can get around the I/O BAR issue on radeon: in your kernel source, go to drivers/gpu/drm/radeon/radeon_device.c, remove these lines: https://github.com/torvalds/linux/blob/598a597636f8618a0520fd3ccefedaed9e4709b0/drivers/gpu/drm/radeon/radeon_device.c#L1419-L1428 (in radeon_device_init() the /* io port mapping */ section).

This is a silly check left over there, you can see another place in that file says DRM_ERROR("Unable to find PCI I/O BAR; using MMIO for ATOM IIO\n"); so the cards are perfectly fine without the I/O BAR. In the FreeBSD port of the driver, these lines are just ifdef'd out, sooooo yeah :) And of course there is no such mistake in the amdgpu kernel driver so indeed modern Radeons likely would just work.

valpackett commented 3 years ago

^ that would be me :) didn't find the github repo yesterday

Ah I see that this error is already non-fatal as mentioned above.

which seems to be generated here ( https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/radeon/evergreen.c#L5186

oh. So, Expecting atombios for evergreen GPU, without Unable to locate a BIOS ROM or BIOS signature incorrect, without failing right before that message (note: ASIC_IS_AVIVO(rdev) == (rdev->family >= CHIP_RS600), CEDAR is far newer than RS600), this means that radeon_get_bios succeeded but is_atom_bios == false.

So either the firmware on the card is corrupted / some weird version, or the PCIe memory reads somehow failed in a way that

I don't know how PCIe reads could fail in that way, maybe it's the former option.

Does the card work on an x86 PC? (Both Windows and Linux tests would be interesting)

scarburato commented 3 years ago

Maybe enabling drm debug could reveal more information as DRM_DEBUG starts printing. No idea on how to enable it before boot, maybe with some options in the bootloader.

P.S. enabling the option on my desktop computer caused a severe performance impact on the desktop and a giant system log

ZILtoid1991 commented 3 years ago

There's probably some way to get around the I/O BAR issue by allocating it as normal BAR space in kernel, but I cannot guarantee that it would properly work, especially without glitches. This would also need a hack in the kernel's source code. Or at least it seems to me after reading up for it on osdev wiki.

I/O BARs are for emulating the behavior of memory mapped I/O devices without any loss to legacy functionality.

valpackett commented 3 years ago

@ZILtoid1991 Radeon GPUs do not require I/O BARs, this has already been posted a few times in this thread. The radeon driver will happily use memory BARs, even if after logging a "scary" error message.

pppq commented 3 years ago

It might legitimately have a "COM" BIOS (whatever that means, that was the other option in the debug message). There are some images uploaded from GPU-Z on TechPowerUp's site, which have the same PCI ID and refer to ATOM in the identifier strings, eg. one from an MSI card: https://www.techpowerup.com/vgabios/153193/msi-hd5450-1024-120919

valpackett commented 3 years ago

@pppq COMBIOS is something really, really ancient that could never be legitimately found on this generation of GPUs. radeon_get_bios is used on all generations, it's common code. evergreen_init of course expects ATOMBIOS, but e.g. r100_init expects COMBIOS. For some perspective, r100 is from the year 2000.

PixlRainbow commented 3 years ago

Interesting note: someone HAS successfully got hardware accelerated graphics a Radeon card on a non-x86 computer on Linux before, but it wasn't an ARM computer -- it was RISC-V. It was also some AMD-era Radeon, not an ATi-era Radeon. They don't really go into much details though..

Edit: apparently, SiFive's repos indicate that certain Radeon devices have been known to work with non-x86 processors on open source drivers, AND they also show the patches they had to make to the drivers to make them work.
No idea how helpful this is considering the differing architectures though, but I note that from a glance I saw they had to disable the SVGA drivers in mesa because it was an x86-unique feature.

ZILtoid1991 commented 3 years ago

They also could've enabled the I/O BARs somehow.

valpackett commented 3 years ago

@PixlRainbow I do have a Radeon RX 480 (also tested a Radeon HD 7950) running on an ARM computer (SolidRun MACCHIATObin) too. On FreeBSD instead of Linux, even :)

the patches they had to make to the drivers to make them work

This is related to building userspace software, which in our case is already built by the distro. This is irrelevant.

They also could've enabled the I/O BARs somehow

@ZILtoid1991 no need for "somehow", if your PCIe host controller is decent, I/O BARs Just Work™. They DO NOT require having ISA-level I/O ports like x86 inb outb. The controller just maps them into memory, of course.


Again, I/O BARs are not required! Everyone: read https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/4#issuecomment-719649428 carefully.

PixlRainbow commented 3 years ago

Wait, which IO board has he mounted the CM4 onto?

paulwratt commented 3 years ago

Just a note:

It was also some AMD-era Radeon, not an ATi-era Radeon.

the 5450 (HD5000) series is ATI-era hardware (the last @ 100%), even though it is produced by AMD. The HD7000 series is AMD-era (as it has changes to the ATI-style Radeons), sorry not sure about the HD6000 series (I just know I did not want one). I know this because I spent time trying to get both HIS HD5450 PCI Silent and an HD7450 PCIe Silent drivers on an old Dell/HP with MacOSX Leopard/Snow Leopard around 2011/12. My understanding of the hardware is that the HD7000 in the last of the ATI-style Radeons, and that the HD5000 was basically an liner ATI evolution/progression, whereas the card that cam after the HD5000 series where AMD modified ATI-style Radeons. The HD8000 series is the first non-ATI-style Radeon

The HD5450 was one of the cards they got working on the RISC-V setup, and since there were no changes between the HD6450 and the HD5450 in getting the cards to work, I am 110% confident that the HD6000 series is also ATI-era hardware.

I would say that a second go at the kernel driver build, with the knowledge presented in this thread (disabling some lines as BSD does) and the use of those patched user-space drivers (MESA, XORG), there is a 100% chance to get this HD5450 working on CM4 with PCIe adapter extension connection, with hardware acceleration.

Yes, you can quote me on that.

I further suggest it would also be possible to get a vulkan driver to build with hardware support too. That might take some fiddling, but not too much work (a day or 2 maybe at most, even down to a couple of hours).

valpackett commented 3 years ago

ATI-style Radeons

The HD7000 series is AMD-era

This marketing mess is confusing, use the architecture names. The first TeraScale GPUs shipped in 2007, next year after AMD bought ATI. The ATI brand lasted until 2010, but the modern architecture GCN shipped in 2012, so these timelines are not synchronized.

You CAN NOT tell the architecture by the "HD7xxx" first digit because of the massive amount of rebrands. 7670 and lower 7xxx numbers are TeraScale 2, just like the 5450 discussed here: https://en.wikipedia.org/wiki/Template:AMD_Radeon_HD_7xxx This even happened into the next naming scheme for a bit, see R5 235/230/225/220.

Very helpful bookmark: https://en.wikipedia.org/wiki/Template:AMD_GPU_features

I further suggest it would also be possible to get a vulkan driver to build with hardware support too

Mesa that comes in the distros already includes Vulkan support, namely the RADV driver. But Vulkan is only supported on GCN-era (amdgpu) GPUs, not on anything this old. TeraScale hardware DOES NOT RUN VULKAN.

with the knowledge presented in this thread (disabling some lines as BSD does) and the use of those patched user-space drivers (MESA, XORG)

Again, it turned out that these lines were NOT THE ISSUE (they just logged a message but did not end the loading process), everyone please stop repeating my mistake.

And NO USERSPACE PATCHES SHOULD BE NECESSARY. aarch64 is not riscv64. aarch64 Just Works with mainline Mesa.

geerlingguy commented 3 years ago

Was going to test this card again today with the expanded BAR space (see https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/2#issuecomment-730025910), but this card is also giving me 'link down' all the time now :-/

geerlingguy commented 3 years ago

Well, now it's finally booting at least, if I don't use external power.

[   50.212050] [drm] radeon kernel modesetting enabled.
[   50.212392] radeon 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 0: 0x700000000 -> 0x70fffffff
[   50.212403] radeon 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 2: 0x600000000 -> 0x60001ffff
[   50.212511] pci 0000:00:00.0: enabling device (0000 -> 0002)
[   50.212536] radeon 0000:01:00.0: enabling device (0000 -> 0002)
[   50.213507] [drm] initializing kernel modesetting (CEDAR 0x1002:0x68F9 0x1545:0x5450 0x00).
[   50.213710] [drm:radeon_device_init [radeon]] *ERROR* Unable to find PCI I/O BAR
[   50.329687] radeon 0000:01:00.0: Expecting atombios for evergreen GPU
[   50.329697] radeon 0000:01:00.0: Fatal error during GPU init
[   50.329703] [drm] radeon: finishing device.
[   50.329709] [TTM] Memory type 2 has not been initialized
[   50.337212] radeon: probe of 0000:01:00.0 failed with error -22
geerlingguy commented 3 years ago

The [TTM] Memory type 2 has not been initialized seems different...

PixlRainbow commented 3 years ago

probably should try a graphics card that has its own external power connector next time? So we aren't relying on pulling power through the PCIe slot.

Coreforge commented 3 years ago

Got a CM4 now, so I've started some testing with a simillar card. It's a Radeon HD6450 2GB, which shouldn't be too different (should I open a new Issue anyways or continue in this one?). I flashed the current 64bit version of Raspberry Pi OS, compiled the 5.10.y kernel with the radeon and amdgpu driver, blacklisted the radeon driver and modprobed it. It crashed as expected, but left a trace in dmesg.

[  193.679976] [drm] radeon kernel modesetting enabled.
[  193.680427] pci 0000:00:00.0: enabling device (0000 -> 0002)
[  193.680449] radeon 0000:01:00.0: enabling device (0000 -> 0002)
[  193.680918] [drm] initializing kernel modesetting (CAICOS 0x1002:0x6779 0x174B:0xA004 0x00).
[  193.681196] [drm:radeon_device_init [radeon]] *ERROR* Unable to find PCI I/O BAR
[  193.807109] radeon 0000:01:00.0: Expecting atombios for evergreen GPU
[  193.807122] radeon 0000:01:00.0: Fatal error during GPU init
[  193.807129] [drm] radeon: finishing device.
[  193.807137] [TTM] Memory type 2 has not been initialized
[  193.814644] radeon: probe of 0000:01:00.0 failed with error -22
[  193.814653] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000168
[  193.825542] Mem abort info:
[  193.828420]   ESR = 0x96000005
[  193.831516]   EC = 0x25: DABT (current EL), IL = 32 bits
[  193.836934]   SET = 0, FnV = 0
[  193.840045]   EA = 0, S1PTW = 0
[  193.843224] Data abort info:
[  193.846276]   ISV = 0, ISS = 0x00000005
[  193.850187]   CM = 0, WnR = 0
[  193.853204] user pgtable: 4k pages, 39-bit VAs, pgdp=00000000423bc000
[  193.859742] [0000000000000168] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[  193.868594] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  193.874243] Modules linked in: radeon i2c_algo_bit ttm fuse sha256_generic cfg80211 rfkill 8021q garp stp llc vc4 cec v3d drm_kms_helper gpu_sched drm bcm2835_v4l2(C) bcm2835_codec(C) bcm2835_isp(C) v4l2_mem2mem bcm2835_mmal_vchiq(C) videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common snd_soc_core snd_compress drm_panel_orientation_quirks videodev snd_bcm2835(C) snd_pcm_dmaengine raspberrypi_hwmon snd_pcm snd_timer mc vc_sm_cma(C) snd syscopyarea sysfillrect sysimgblt fb_sys_fops rpivid_mem backlight uio_pdrv_genirq uio i2c_dev ip_tables x_tables ipv6
[  193.927068] CPU: 1 PID: 446 Comm: Xorg Tainted: G         C        5.10.2-v8+ #1
[  193.934564] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[  193.941092] pstate: 00000005 (nzcv daif -PAN -UAO -TCO BTYPE=--)
[  193.947249] pc : radeon_driver_open_kms+0x40/0x190 [radeon]
[  193.952929] lr : radeon_driver_open_kms+0x38/0x190 [radeon]
[  193.958573] sp : ffffffc011f7b9e0
[  193.961927] x29: ffffffc011f7b9e0 x28: 0000000000000000 
[  193.967312] x27: 0000000000000000 x26: 0000000000000000 
[  193.972695] x25: ffffff806517b8c8 x24: 000000005cbd5280 
[  193.978079] x23: ffffff806517b800 x22: 0000000000000000 
[  193.983463] x21: ffffff805cb48800 x20: 0000000000000001 
[  193.988847] x19: ffffff806517b800 x18: 0000000000000000 
[  193.994232] x17: 0000000000000000 x16: 0000000000000000 
[  193.999614] x15: 0000000000000000 x14: 00000020632d6c6c 
[  194.004998] x13: 0006001072657474 x12: 0000000000000018 

Message from syslogd@raspberrypi at Dec 26 01:28:51 ...
 kernel:[  193.868594] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  194.010381] x11: 0101010101010101 x10: 00000000000019e0 
[  194.015766] x9 : ffffffc010aa37b8 x8 : ffffff805cb48a00 
[  194.021149] x7 : 0000000000000000 x6 : 0000000000000000 
[  194.026533] x5 : ffffffc011239000 x4 : ffffffc011f7b9b0 
[  194.031916] x3 : ffffffc011239190 x2 : 0000000000000001 
[  194.037300] x1 : ffffff8042ff1e40 x0 : 0000000000000001 
[  194.042684] Call trace:
[  194.045193]  radeon_driver_open_kms+0x40/0x190 [radeon]
[  194.050562]  drm_file_alloc+0x150/0x238 [drm]
[  194.055012]  drm_open+0xe0/0x278 [drm]
[  194.058844]  drm_stub_open+0xb0/0x168 [drm]
[  194.063087]  chrdev_open+0xac/0x1a8
[  194.066619]  do_dentry_open+0x134/0x3a0
[  194.070501]  vfs_open+0x34/0x40
[  194.073680]  path_openat+0x87c/0xdf8
[  194.077298]  do_filp_open+0x80/0x108
[  194.080917]  do_sys_openat2+0x1f0/0x2a0
[  194.084800]  do_sys_open+0x60/0xb0
[  194.088243]  __arm64_sys_openat+0x2c/0x38
[  194.092305]  el0_svc_common.constprop.0+0x84/0x1e8
[  194.097158]  do_el0_svc+0x2c/0x98
[  194.100515]  el0_svc+0x20/0x30
[  194.103605]  el0_sync_handler+0xb0/0xb8
[  194.107489]  el0_sync+0x174/0x180
[  194.110847] Code: f9400c00 95de6f05 2a0003f4 37f807c0 (b9416ac0) 
[  194.117024] ---[ end trace 47e97aeaf44f2e91 ]---

The ssh session I probed the module from also got some syslogd messages, so ssh seems to live long enough to get error messages out.

Message from syslogd@raspberrypi at Dec 26 01:28:51 ...
 kernel:[  193.868594] Internal error: Oops: 96000005 [#1] PREEMPT SMP

Message from syslogd@raspberrypi at Dec 26 01:28:51 ...
 kernel:[  194.110847] Code: f9400c00 95de6f05 2a0003f4 37f807c0 (b9416ac0) 

If I don't blacklist radeon, the pi starts to start the desktop (console gets cleared) and just shows a blinking cursor until I power it down.

Here's also the BAR space allocation if anyone's interested

[    1.298057] pci 0000:00:00.0: BAR 9: assigned [mem 0x600000000-0x60fffffff 64bit pref]
[    1.301382] pci 0000:00:00.0: BAR 8: assigned [mem 0x610000000-0x6100fffff]
[    1.304690] pci 0000:01:00.0: BAR 0: assigned [mem 0x600000000-0x60fffffff 64bit pref]
[    1.308011] pci 0000:01:00.0: BAR 2: assigned [mem 0x610000000-0x61001ffff 64bit]
[    1.311310] pci 0000:01:00.0: BAR 6: assigned [mem 0x610020000-0x61003ffff pref]
[    1.314597] pci 0000:01:00.1: BAR 0: assigned [mem 0x610040000-0x610043fff 64bit]
[    1.317882] pci 0000:01:00.0: BAR 4: no space for [io  size 0x0100]
[    1.321130] pci 0000:01:00.0: BAR 4: failed to assign [io  size 0x0100]
[    1.324352] pci 0000:00:00.0: PCI bridge to [bus 01]
[    1.327560] pci 0000:00:00.0:   bridge window [mem 0x610000000-0x6100fffff]
[    1.330762] pci 0000:00:00.0:   bridge window [mem 0x600000000-0x60fffffff 64bit pref]
[    1.334156] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
Coreforge commented 3 years ago

This is interesting. When I connected to the pi over ssh yesterday, it locked up after loading the radeon module. When connecting over serial though, I can still use the shell to some extend, so it doesn't lock up completely.

6by9 commented 3 years ago

[ 193.807109] radeon 0000:01:00.0: Expecting atombios for evergreen GPU is the main issue there. https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/radeon/evergreen.c#L5186

    if (!rdev->is_atom_bios) {
        dev_err(rdev->dev, "Expecting atombios for evergreen GPU\n");
        return -EINVAL;
    }

ie it just gives up if the situation doesn't quite fit with what they expect. The Pi has no BIOS, therefore I don't know what it's finding there, nor whether it actually really matters.

One could try removing that failure path and seeing what happens next, but I'd be surprised if it just worked. I think the kernel splat is likely to be a error handling path failing and causing a splat.

Wiki does seem to have a good set of articles describing the different Radeon cards. https://en.wikipedia.org/wiki/Radeon_HD_5000_series Evergreen is the HD5000 series. The "Northern Islands" is meant to be the HD6000 range (Caicos for 6450), so I'm not quite sure why it's trying to load Evergreen for an HD6450. It's identified correctly as CAICOS too [ 193.680918] [drm] initializing kernel modesetting (CAICOS 0x1002:0x6779 0x174B:0xA004 0x00), which comes from https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/radeon/radeon_device.c#L1312 https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/radeon/radeon_asic.c#L2419 is the case for the Evergreen HD5000's, and https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/radeon/radeon_asic.c#L2438 is the case for "Northern Islands" HD6000's.

geerlingguy commented 3 years ago

Speaking of the HD6450 in particular (I'm fine with that being merged into this issue; it would be nice to pop it into the database site too!)—I remember speaking to a few people who seem to have more close affiliation with AMD and they suggested that generation and older would not likely be easy (heh, 'easy'... none of these GPUs I would qualify as 'easy'!) to get working on ARM due to some design constraints from the older generations.

But I am far from qualified to know what I'm talking about with old GPUs—as I've said elsewhere, the last time I ever spent much time (until now) worrying about GPUs and drivers was when the ATi Rage Pro 128 was new and amazing.

6by9 commented 3 years ago

Looking at the table on Wiki, I wonder if looking at the newer cards using the drm/amdgpu driver instead of drm/radeon is likely to give better results. Time to try and find a cheap Rx300 series card.

I've not worried about x86 GPUs either for a long while - I've never been a heavy gamer, so integrated graphics has normally worked for me. Although I did get the Acer Revo R3610 with added Nvidia ION GPU so it actually made a reasonable media player!