Open geerlingguy opened 1 year ago
To use radeontop
:
sudo apt install -y libdrm-dev libncurses-dev libxcb-dri2-0-dev
git clone https://github.com/clbr/radeontop.git
cd radeontop
make
./radeontop
Since I'm having trouble getting into lightdm / wayfire, it's slightly less useful to me right now though :D
If I use raspi-config
to boot to CLI instead of desktop, I try running:
$ wayfire-pi
II 23-11-23 12:51:00.366 - [src/main.cpp:280] Starting wayfire version 0.7.5
II 23-11-23 12:51:00.366 - [libseat] [libseat/backend/seatd.c:64] Could not connect to socket /run/seatd.sock: No such file or directory
II 23-11-23 12:51:00.366 - [libseat] [libseat/libseat.c:76] Backend 'seatd' failed to open seat, skipping
Bus error
And:
$ startx
... get logged errors ...
$ cat /home/pi/.local/share/xorg/Xorg.0.log
...
[ 1476.387] (II) Applying OutputClass "AMDgpu" options to /dev/dri/card2
[ 1476.387] (==) modeset(G0): RGB weight 888
[ 1476.387] (==) modeset(G0): Default visual is TrueColor
[ 1476.387] (II) Loading sub module "glamoregl"
[ 1476.387] (II) LoadModule: "glamoregl"
[ 1476.387] (II) Loading /usr/lib/xorg/modules/libglamoregl.so
[ 1476.387] (II) Module glamoregl: vendor="X.Org Foundation"
[ 1476.387] compiled for 1.21.1.7, module version = 1.0.1
[ 1476.387] ABI class: X.Org ANSI C Emulation, version 0.4
[ 1476.394] (EE)
[ 1476.395] (EE) Backtrace:
[ 1476.397] (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x188) [0x5555b82fc668]
[ 1476.397] (EE) unw_get_proc_info failed: no unwind info found [-10]
[ 1476.397] (EE)
[ 1476.398] (EE) Bus error at address 0x7ffec3a78080
[ 1476.398] (EE)
Fatal server error:
[ 1476.398] (EE) Caught signal 7 (Bus error). Server aborting
[ 1476.398] (EE)
[ 1476.398] (EE)
I grabbed Coreforge's memcpy library:
wget https://gist.githubusercontent.com/Coreforge/91da3d410ec7eb0ef5bc8dee24b91359/raw/b4848d1da9fff0cfcf7b601713efac1909e408e8/memcpy_unaligned.c
gcc -shared -fPIC -o memcpy.so memcpy_unaligned.c
sudo mv memcpy.so /usr/local/lib/memcpy.so
sudo nano /etc/ld.so.preload
# Put the following line inside ld.so.preload:
/usr/local/lib/memcpy.so
That got much further with wayfire...
II 23-11-23 12:57:46.203 - [backend/drm/drm.c:1553] Found connector 'DVI-D-1'
II 23-11-23 12:57:46.203 - [backend/drm/drm.c:1614] connector HDMI-A-3: Requesting modeset
II 23-11-23 12:57:46.203 - [src/core/output-layout.cpp:1098] new output: HDMI-A-3
II 23-11-23 12:57:46.203 - [src/core/output-layout.cpp:537] loaded mode auto
II 23-11-23 12:57:46.231 - [backend/drm/drm.c:734] connector HDMI-A-3: Modesetting with 1920x1080 @ 60.000 Hz
(type equals variant: [type: string, value: toplevel] | (type equals variant: [type: string, value: x-or] & focusable equals variant: [type: bool, value: 1]))
type equals variant: [type: string, value: overlay]
false
false
false
app_id equals variant: [type: string, value: Kodi]
(type equals variant: [type: string, value: toplevel] & floating equals variant: [type: bool, value: 1])
II 23-11-23 12:57:46.288 - [backend/drm/drm.c:1502] Scanning DRM connectors on /dev/dri/card1
II 23-11-23 12:57:46.290 - [backend/drm/drm.c:1553] Found connector 'HDMI-A-1'
II 23-11-23 12:57:46.294 - [backend/drm/drm.c:1553] Found connector 'HDMI-A-2'
EE 23-11-23 12:57:46.294 - [types/wlr_cursor.c:875] Cannot map device "smart-kvm Multifunction USB Device" to output (not found in this cursor)
EE 23-11-23 12:57:46.294 - [types/wlr_cursor.c:875] Cannot map device "pwr_button" to output (not found in this cursor)
EE 23-11-23 12:57:46.294 - [types/wlr_cursor.c:875] Cannot map device "vc4-hdmi-0" to output (not found in this cursor)
EE 23-11-23 12:57:46.294 - [types/wlr_cursor.c:875] Cannot map device "vc4-hdmi-1" to output (not found in this cursor)
EE 23-11-23 12:57:46.296 - [render/allocator/gbm.c:147] gbm_bo_create failed
EE 23-11-23 12:57:46.296 - [render/swapchain.c:109] Failed to allocate buffer
startx
also got further... but I'm not sure what's up, it just ends up not rendering a display through the RX 460 at the point I run it (the system is not locked up however).
To fix that, make sure you enable "Fix up misaligned multi-word loads and stores in user space" when recompiling the kernel:
Kernel Features
-> Kernel support for 32bit EL0
-> Fix up misaligned multi-word loads and stores in user space
On the site now: https://pipci.jeffgeerling.com/cards_gpu/xfx-radeon-rx460-4gb.html
Was there anything in dmesg when running wayfire or x11?
I saw you had some issues compiling in #6. compat_alignment.c
might only get compiled if Kernel Features -> Kernel support for 32bit EL0 -> Fix up misaligned multi-word loads and stores in user space
is enabled (I should probably move the code into a separate file, as that option is disabled by default).
There might be something in newer mesa versions that doesn't get entirely fixed by the memcpy library that's now causing issues with startx as well, as I could get that running before without additional alignment in the kernel. Wayfire was triggering the alignment trap a few times though, so that currently won't work without it. If it's still getting stuck somewhere (with the alignment trap), dmesg will likely get spammed full of essentially the same error over and over again. I'd need at least the Faulting instruction:
and ideally the Load/Store: op0....
line if it's there as well to add the relevant instruction(s). I'm currently just adding them as I encounter issues, as there are quite a lot of load/store instructions on arm64.
My card has 4gb of vram as well, so that's not an issue.
@Coreforge - indeed, after enabling that flag, I can compile (with a number of warnings), rebooting now...
Running wayfire-pi
, while the environment initializes, I see:
[ 40.300504] Alignment fixup
[ 40.300510] Faulting instruction: 0xa9001444
[ 40.300513] Load/Store
[ 40.300515] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x1
[ 40.300517] Storing 8 bytes (pair: 1) to 0x7fff5056016c
[ 40.309090] Alignment fixup
[ 40.309098] Faulting instruction: 0xa9000c22
[ 40.309101] Load/Store
[ 40.309102] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x3
[ 40.309105] Storing 8 bytes (pair: 1) to 0x7fff50568e7c
[ 41.159727] Alignment fixup
[ 41.159732] Faulting instruction: 0xa9001444
[ 41.159735] Load/Store
[ 41.159737] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x1
[ 41.159739] Storing 8 bytes (pair: 1) to 0x7fff5056056c
[ 41.289474] Alignment fixup
[ 41.289486] Faulting instruction: 0xa9001444
[ 41.289490] Load/Store
[ 41.289491] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x1
When I clicked on the Pi menu, I saw:
[ 41.289494] Storing 8 bytes (pair: 1) to 0x7fff5056096c
[ 132.284968] Alignment fixup
[ 132.284976] Faulting instruction: 0xa9001444
[ 132.284980] Load/Store
[ 132.284983] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x1
When I opened up Chromium there were maybe 30 or so fixups.
I ran glmark2
and it was spitting out hundreds (?) of fixups per second—certainly a huge number was running through. It seemed to fail during [ideas]
, but it got through a bit before doing so... getting over 2,000 fps during a number of tests.
It did give the GPU some work, though!
Not enough to kick in the internal fans it seems... I think they work :P (the fun of testing used hardware...).
The last time I installed Minecraft on a Pi I just used Pi-Apps — is there a preferred place where you grab it?
Were there a bunch of fixup messages in dmesg without theStoring %d bytes
message when glmark failed? (they might get mixed up a bit, but if it's getting stuck on an instruction, there should be a lot of other messages without that one)
The fans were occasionally spinning up on my card, but not very much, so depending on the fan curve, cooler, and power profile on your card it might just not get warm enough with these loads.
I'm just running minecraft from a technic install where I replaced the native libraries with arm versions and echoed out the actual launch command (since the launcher would otherwise overwrite the libraries again with x86 versions). I think some launchers directly support arm now, but I haven't tried any in a while.
SuperTuxKart ox max settings would probably be a good benchmark for these cards, as it's arm64 native, OpenGL/GLES based, and in the Raspbian repos.
A few notes:
[ 564.557038] Faulting instruction: 0xf8226865
[ 564.557039] systemd-journald[2049]: /dev/kmsg buffer overrun, some messages lost.
[ 564.557039] Load/Store
[ 564.557041] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[ 564.557043] Alignment fixup
[ 564.557044] Faulting instruction: 0xf8226865
[ 564.557046] Load/Store
[ 564.557047] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[ 564.557049] Alignment fixup
[ 564.557051] Faulting instruction: 0xf8226865
[ 564.557052] Load/Store
[ 564.557053] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[ 564.557055] Alignment fixup
[ 564.557057] Faulting instruction: 0xf8226865
[ 564.557058] Load/Store
[ 564.557059] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[ 564.557062] Alignment fixup
[ 564.557063] Faulting instruction: 0xf8226865
[ 564.557064] Load/Store
[ 564.557065] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[ 564.557068] Alignment fixup
[ 564.557069] Faulting instruction: 0xf8226865
[ 564.557071] Load/Store
[ 564.557072] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[ 564.557074] Alignment fixup
[ 564.557075] Faulting instruction: 0xf8226865
[ 564.557077] systemd-journald[2049]: /dev/kmsg buffer overrun, some messages lost.
[ 564.557078] Load/Store
[ 564.557146] Load/Store: op0 0xf op1 0x0 op2 0x0 op3 0x22 op4 0x2
[ 564.557254] systemd-journald[2049]: /dev/kmsg buffer overrun, some messages lost.
[ 564.560311] Alignment fixup
[ 564.560317] Faulting instruction: 0xa9001444
[ 564.560320] Load/Store
[ 564.560321] Load/Store: op0 0xa op1 0x0 op2 0x2 op3 0x0 op4 0x1
[ 564.560324] Storing 8 bytes (pair: 1) to 0x7fff3831296c
[ 564.760582] systemd[1]: Started systemd-journald.service - Journal Service.
It was stuck on [ideas]
both times I think. (The blue wireframey wavey one)
This thread may get a little more activity, I just posted a video: You can use external GPUs on the Raspberry Pi 5.
The last time I installed Minecraft on a Pi I just used Pi-Apps — is there a preferred place where you grab it?
@geerlingguy Pi-Apps to install Minecraft (Minecraft Bedrock, Minecraft Java with Prism Launcher, and Minecraft Pi) will work well. All of them are native ARM64. I would love to see Minecraft Java with Prism Launcher running on that setup hopefully with the Simply Optimized modpack or similar.
I need to try this myself at some point (I have an RX 570 8GB which is still a polaris card)
I guess one of those mining riser cards like this
plus the m.2 hat should do the trick
Regarding the cursor, according to this YouTube comment I could add the environment variable WLR_NO_HARDWARE_CURSORS=1
to use the software renderer, and that would hopefully keep it visible for now :)
The faulting instruction seems to just be a 64bit store (which, since I haven't encountered them, I haven't added yet). I'll hopefully get it added in the next few days.
If PCIe on the Pi 5 is anything like on the LX2160A, the GPU might fall off the bus if the PCIe link rate changes as a power-savings feature. One way to work around that is to set amdgpu.pcie_gen_cap
to 0x10001
for gen1, 0x20002
for gen2, or 0x40004
for gen3. While you're at it, you might try amdgpu.aspm=0
.
There's also a double-negative amdgpu.noretry=0
to enable retries of... something, I don't know exactly what.
I'm also curious if amdgpu's HDMI audio sounds correct on the Pi. On the LX2160A, the audio comes out crackly and garbled.
SuperTuxKart ox max settings would probably be a good benchmark for these cards, as it's arm64 native, OpenGL/GLES based, and in the Raspbian repos.
Another good thing to try is the game Veloren. The launcher for it is available in flatpak: net.veloren.airshipper
It's a game that has support for Vulkan, DX12, and Metal, with native ARM64 versions for Linux and Mac OS.
If you create a world with a fixed non-zero seed, and then Spectate World, I think you should end up with the same map viewed from about the same place, possibly with different weather at a given moment.
seems like AMD is also working on it, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.7-rc3&id=ba0fb4b48c19a2d2380fc16ca4af236a0871d279
Did you talk about it on amd-gfx mail list?
I got the uPCity today (Thanks to the Pineberry people again!), but haven't had too much time to do benchmarks yet.
glmark2
is now also crashing for me on the ideas test with the same instruction, so I guess I just had some old library that got updated when I installed wayfire that did something differently. Minecraft has a similar issue, so until I get that added, I only have this partial run of glmark2. Since I didn't do another run with the old adapter on the same libraries, I don't think the numbers can be compared directly. I saw a GPU utilization of 95% through a good part of the run though, so it doesn't seem like it's getting CPU limited (and toning down the logging should also help with that).
=======================================================
glmark2 2023.01
=======================================================
OpenGL Information
GL_VENDOR: AMD
GL_RENDERER: AMD Radeon RX 460 Graphics (polaris11, LLVM 15.0.6, DRM 3.49, 6.1.61-v8_16k+)
GL_VERSION: 4.6 (Compatibility Profile) Mesa 23.2.1-1~bpo12+rpt2
Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
Surface Size: 3840x2160 fullscreen
=======================================================
[build] use-vbo=false: FPS: 342 FrameTime: 2.929 ms
[build] use-vbo=true: FPS: 1799 FrameTime: 0.556 ms
[texture] texture-filter=nearest: FPS: 1824 FrameTime: 0.548 ms
[texture] texture-filter=linear: FPS: 1875 FrameTime: 0.533 ms
[texture] texture-filter=mipmap: FPS: 1866 FrameTime: 0.536 ms
[shading] shading=gouraud: FPS: 1750 FrameTime: 0.572 ms
[shading] shading=blinn-phong-inf: FPS: 1770 FrameTime: 0.565 ms
[shading] shading=phong: FPS: 1741 FrameTime: 0.574 ms
[shading] shading=cel: FPS: 1734 FrameTime: 0.577 ms
[bump] bump-render=high-poly: FPS: 1755 FrameTime: 0.570 ms
[bump] bump-render=normals: FPS: 1810 FrameTime: 0.553 ms
[bump] bump-render=height: FPS: 1821 FrameTime: 0.549 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 1411 FrameTime: 0.709 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 508 FrameTime: 1.972 ms
[pulsar] light=false:quads=5:texture=false: FPS: 1434 FrameTime: 0.698 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 523 FrameTime: 1.913 ms
[desktop] effect=shadow:windows=4: FPS: 904 FrameTime: 1.107 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 123 FrameTime: 8.158 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 136 FrameTime: 7.383 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 158 FrameTime: 6.364 ms
I'll try SuperTuxCart as well once I get the things I had running running again.
Hello, one benchmark that I would suggest is GravityMark. It has support for modern features such as ray tracing and I think that it has a native AArch64 build for all major desktop operating systems - certainly for Linux (I have tried it myself on an Ampere eMAG machine).
Geekbench's compute benchmark is another option, covering a different GPU use case.
Geekbench's compute benchmark is another option, covering a different GPU use case.
@volyrique Geekbench compute (at least the vulkan backend) isn't in the linux arm64 geekbench 5/6 builds. It is only in the x86_64 builds. I have been asking jfpoole to add it for a year now but they won't add it for some reason. You can run the geekbench 5/6 x86_64 compute benchmark through box64 though and it does work fully with (probably) minimal overhead. That is what I did to get the geekbench vulkan compute benchmark results here -> https://forums.raspberrypi.com/viewtopic.php?p=2144650#p2140061 on Pi4 and Pi5
That still leaves OpenCL as an option, doesn't it?
glmark2 on mostly updated packages at gen3 speeds:
=======================================================
glmark2 2023.01
=======================================================
OpenGL Information
GL_VENDOR: AMD
GL_RENDERER: AMD Radeon RX 460 Graphics (polaris11, LLVM 15.0.6, DRM 3.49, 6.1.61-v8_16k+)
GL_VERSION: 4.6 (Compatibility Profile) Mesa 23.2.1-1~bpo12+rpt2
Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
Surface Size: 3840x2160 fullscreen
=======================================================
[build] use-vbo=false: FPS: 369 FrameTime: 2.717 ms
[build] use-vbo=true: FPS: 2086 FrameTime: 0.480 ms
[texture] texture-filter=nearest: FPS: 2082 FrameTime: 0.480 ms
[texture] texture-filter=linear: FPS: 2078 FrameTime: 0.481 ms
[texture] texture-filter=mipmap: FPS: 2069 FrameTime: 0.484 ms
[shading] shading=gouraud: FPS: 1861 FrameTime: 0.537 ms
[shading] shading=blinn-phong-inf: FPS: 1863 FrameTime: 0.537 ms
[shading] shading=phong: FPS: 1861 FrameTime: 0.537 ms
[shading] shading=cel: FPS: 1862 FrameTime: 0.537 ms
[bump] bump-render=high-poly: FPS: 1868 FrameTime: 0.535 ms
[bump] bump-render=normals: FPS: 2105 FrameTime: 0.475 ms
[bump] bump-render=height: FPS: 2095 FrameTime: 0.477 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 1411 FrameTime: 0.709 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 511 FrameTime: 1.959 ms
[pulsar] light=false:quads=5:texture=false: FPS: 1501 FrameTime: 0.666 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 521 FrameTime: 1.923 ms
[desktop] effect=shadow:windows=4: FPS: 904 FrameTime: 1.107 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 130 FrameTime: 7.693 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 139 FrameTime: 7.212 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 165 FrameTime: 6.080 ms
[ideas] speed=duration: FPS: 1698 FrameTime: 0.589 ms
[jellyfish] <default>: FPS: 1015 FrameTime: 0.986 ms
[terrain] <default>: FPS: 126 FrameTime: 7.938 ms
[shadow] <default>: FPS: 1551 FrameTime: 0.645 ms
[refract] <default>: FPS: 165 FrameTime: 6.093 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 2040 FrameTime: 0.490 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 2032 FrameTime: 0.492 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 2038 FrameTime: 0.491 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 2034 FrameTime: 0.492 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 2040 FrameTime: 0.490 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 2039 FrameTime: 0.491 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 2041 FrameTime: 0.490 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 2028 FrameTime: 0.493 ms
=======================================================
glmark2 Score: 1463
=======================================================
Most of the benchmarks were running with the GPU at 100%, but some (especially buffer) were heavily CPU bound, with the GPU sitting at only around 20%. That's likely due to the alignment trap getting triggered a lot (it's getting triggered a lot on the other tests too, but likely not nearly as much). Optimizing the trap would likely improve it a bit, but the better option would be finding which library/function is causing those issues and fixing that (since it didn't happen before I updated a bunch of stuff, it should be some system library, probably some part of mesa).
And at gen1 speeds:
=======================================================
glmark2 2023.01
=======================================================
OpenGL Information
GL_VENDOR: AMD
GL_RENDERER: AMD Radeon RX 460 Graphics (polaris11, LLVM 15.0.6, DRM 3.49, 6.1.61-v8_16k+)
GL_VERSION: 4.6 (Compatibility Profile) Mesa 23.2.1-1~bpo12+rpt2
Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
Surface Size: 3840x2160 fullscreen
=======================================================
[build] use-vbo=false: FPS: 287 FrameTime: 3.485 ms
[build] use-vbo=true: FPS: 2078 FrameTime: 0.481 ms
[texture] texture-filter=nearest: FPS: 2080 FrameTime: 0.481 ms
[texture] texture-filter=linear: FPS: 2081 FrameTime: 0.481 ms
[texture] texture-filter=mipmap: FPS: 2072 FrameTime: 0.483 ms
[shading] shading=gouraud: FPS: 1863 FrameTime: 0.537 ms
[shading] shading=blinn-phong-inf: FPS: 1862 FrameTime: 0.537 ms
[shading] shading=phong: FPS: 1857 FrameTime: 0.539 ms
[shading] shading=cel: FPS: 1858 FrameTime: 0.538 ms
[bump] bump-render=high-poly: FPS: 1864 FrameTime: 0.537 ms
[bump] bump-render=normals: FPS: 2097 FrameTime: 0.477 ms
[bump] bump-render=height: FPS: 2086 FrameTime: 0.479 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 1401 FrameTime: 0.714 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 509 FrameTime: 1.967 ms
[pulsar] light=false:quads=5:texture=false: FPS: 1506 FrameTime: 0.664 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 514 FrameTime: 1.949 ms
[desktop] effect=shadow:windows=4: FPS: 884 FrameTime: 1.131 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 81 FrameTime: 12.470 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 136 FrameTime: 7.406 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 100 FrameTime: 10.057 ms
[ideas] speed=duration: FPS: 1535 FrameTime: 0.652 ms
[jellyfish] <default>: FPS: 1012 FrameTime: 0.989 ms
[terrain] <default>: FPS: 126 FrameTime: 7.964 ms
[shadow] <default>: FPS: 1549 FrameTime: 0.646 ms
[refract] <default>: FPS: 165 FrameTime: 6.097 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 2040 FrameTime: 0.490 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 2037 FrameTime: 0.491 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 2035 FrameTime: 0.491 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 2038 FrameTime: 0.491 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 2037 FrameTime: 0.491 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 2038 FrameTime: 0.491 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 2035 FrameTime: 0.491 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 2039 FrameTime: 0.491 ms
=======================================================
glmark2 Score: 1450
=======================================================
The GPU utilization was at 100% through most of the run as well. Most of the scores are the same, but some benchmarks (mainly mapped buffer, which transfers a lot of data) were affected quite a bit.
SuperTuxKart was CPU limited, but the GPU was at about 80% rendering at 3840x2160. Since I removed most of the log output, I don't know how much the alignment trap contributed to the CPU load, but it probably had a bit of an impact.
[verbose ] profile: Number of frames: 8420 time 70.956001, Average FPS: 118.665085
[verbose ] profile: Average # drawn nodes 0.000000 k
[verbose ] profile: Average # culled nodes: 0.000000 k
[verbose ] profile: Average # solid nodes: 0.000000 k
[verbose ] profile: Average # transparent nodes: 0.000000
[verbose ] profile: Average # transp. effect nodes: 0.000000
[verbose ] profile: name start_position end_position time average_speed top_speed skid_time rescue_time rescue_count brake_count explosion_time explosion_count bonus_count banana_count small_nitro_count large_nitro_count bubblegum_count
[verbose ] profile: gavroche Skidding 1 3 61.3454 16.1741 20 0 0 0 315 0 0 1 0 2 0 0 358
[verbose ] profile: puffy Skidding 2 1 58.9141 16.8416 21.3464 0 0 0 482 0 0 6 0 1 0 0 320
[verbose ] profile: konqi Skidding 3 4 65.1309 15.234 15.8408 0 0 0 486 0 0 3 0 4 0 0 361
[verbose ] profile: tux Skidding 4 2 60.2146 16.4778 25.8022 0 0 0 399 0 0 3 0 1 0 0 574
[verbose ] profile: min 58.914093 max 65.130867 av 61.401222
[verbose ] profile:
[verbose ] profile: name Strt End Time AvSp Top Skid Resc Rsc Brake Expl Exp Itm Ban SNitLNit Bub Off Energy
[verbose ] profile: Skidding 1 3 61.35 16.17 20.00 0.00 0.00 0 315 0.00 0 1 0 2 0 0 358 0.00
[verbose ] profile: Skidding 2 1 58.91 16.84 21.35 0.00 0.00 0 482 0.00 0 6 0 1 0 0 320 1.00
[verbose ] profile: Skidding 3 4 65.13 15.23 15.84 0.00 0.00 0 486 0.00 0 3 0 4 0 0 361 4.00
[verbose ] profile: Skidding 4 2 60.21 16.48 25.80 0.00 0.00 0 399 0.00 0 3 0 1 0 0 574 1.00
[verbose ] profile: ---------------------------------------------------------------------------------------------------
[verbose ] profile: Skidding +0 61.4012 0.00 0.00 0 1682 0.00 0 13 0 8 0 0 1613 6.00
OpenCL based applications unfortunately are likely rather difficult to run with this card, at least from my experience. It worked fine in my desktop, but I only got it to work with the proprietary OpenCL driver (and I think ROCm dropped support for polaris?)
There are some more instructions left that I know cause issues with some unity games, but a lot of things should work again now.
@Coreforge Awesome, thanks! Could you also try running it windowed inside the wayfire-pi
environment too? (If you get a chance).
I can try, though I had some missing libs last time I tried 3D stuff inside wayfire.
Just for a frame of reference, on my RX 570 on x86_64 desktop I get ~3200 points at the same resolution. So I think you are getting the full performance in that benchmark since the RX 460 is supposed to be a bit less than 1/2 as powerful as the RX 570. Do you have a reference test of that RX 460 on a desktop x86_64 computer?
Also just a suggestion, I imagine you have trouble using the raspberry pi patched and built chromium with anything other than the pi4/pi5 videocore gpus. You will probably have better luck with vanilla chromium either from building chromium from source, using the chromium flatpak or snap (which are vanilla chromium without any notable patches), or using my chromium debs that you can find here -> https://github.com/theofficialgman/testing/releases/tag/gmans-releases (latest version chromium-browser-stable_119.0.6045.199-1_arm64.deb) (note I don't suggest using my chromium debs all the time since I have specifically patched out libVPX support to use ffmpeg for vp9 hardware decoding on nvidia tegra systems, I use these debs to repackage chromium for Switchroot Nintendo Switch linux distros).
I didn't do benchmarks on x86 with the card, but since it was showing 100% utilization in most tests, that should be the full performance in those. I haven't run any browsers so far, but I'll keep that in mind if I do run into any issues.
Fixing SIMD instructions seems to be more complicated, as the SIMD registers don't get saved as part of the pt_regs
struct. I'm currently using fpsimd_save_state()
to save them to a temporary location and just assume that nothing in between the exception and that part in the handler changes those registers (which should be the case), but even though unity games work somewhat now, they segfault sometimes, and not always at the same point.
The issues always occur in the UnityGfxDeviceW
thread, which isn't a big surprise, but I'm not exactly sure what's causing them, I need to do more debugging on that. It does look like I'm not properly fixing some instructions though, as there are graphical glitches in the menu of Getting Over It, and it's even worse in-game. I have it narrowed down to a 128bit unsigned immediate str
, but other than maybe not getting the correct data to write to memory (if fpsimd_save_state()
doesn't work properly for this), I'm not really sure what's causing these problems.
What I was doing before was definitely not working.
I changed to read the vector registers from current->thread.uw.fpsimd_state.vregs
, and call kernel_neon_begin()
before reading the data to ensure it's saved (and then kernel_neon_end()
afterwards), and it's certainly looking better (at least the main menu of Getting Over It is), but Getting Over It still crashes in the same place. Looking around with GDB, I couldn't find much obvious either. It is related to SIMD instruction fixes, as I saw a null pointer dereference when I changed it to just write 0 for all SIMD fixups, but I'm not sure where exactly I'm doing something wrong. I pushed my current code again though.
I tried a few more games that I could run as well: DOOM 2016 with OpenGL can at least get to the menu (I haven't tried further), but the performance isn't too great. (CPU bound, might also be affected by the logging going on). Both DOOM 2016 and DOOM Eternal don't launch with Vulkan, they complain about being unable to initialize vulkan (although I know both work with this card, and vulkan works fine on the pi too). The Talos Principle runs, although performance isn't too great here either. It's also CPU bound, and it wasn't spamming dmesg (though it could still be triggering the alignment trap a lot). This game uses Vulkan.
I also did two runs of GravityMark, one on vulkan and one on opengl. The vulkan one ran and looked just fine, the opengl one was triggering the alignment trap a lot again with SIMD instructions, and the earth was flickering or not there at all for parts of the run.
I saw that there were (likely) two instructions causing problems:
[ 6964.423429] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f68438038
[ 6964.457330] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f6842529c
[ 6964.491163] SIMD: storing 0x40300000402 40100000000 (64 bits) at 0x0000007f68401f9c
[ 6964.491173] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f68401fb8
[ 6964.526794] SIMD: storing 0x40300000402 40100000000 (64 bits) at 0x0000007f6840cc1c
[ 6964.526804] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f6840cc38
[ 6964.560762] SIMD: storing 0x40300000402 40100000000 (64 bits) at 0x0000007f6841981c
[ 6964.560773] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f68419838
[ 6964.594807] SIMD: storing 0x40a00000409 40800000407 (128 bits) at 0x0000007f6842869c
[ 6964.628703] SIMD: storing 0x40300000402 40100000000 (64 bits) at 0x0000007f684353f4
[ 6964.664459] SIMD: storing 0x40300000402 40100000000 (64 bits) at 0x0000007f68411f9c
I've been mostly focused on the 128bit stuff for now (as one of the instructions I saw causing a lot of issues was a 128bit str), so I suspect the 64bit one might not get handled properly.
I also gave monado a try (I wasn't expecting anything too special, as it mainly just needs vulkan, which works fine), and it works fine, though I haven't been able to get anything other than hello_xr to run.
Apparently I forgot to read the second data register for SIMD instructions, so instructions stp qx, qx, [xx]
only stored the first register, and not the second one. I also added some code to keep track of which instructions have already been handled at some point and which haven't, to make it easier to find potentially bad ones.
I added a bunch of instruction tests to my gpu memory access tests (all instructions I've seen get handled when launching Getting Over It or The Long Dark), which helped me find the stp
issue. All instructions being tested are now being handled correctly. However, Unity games are still segfaulting, and I don't know why. I might try rendering them with llvmpipe instead to make sure it's actually related to the GPU, although I think it is.
Most load and store instructions should work now (at least I haven't come across any that didn't seem to so far), except for loads with sign extension, for which the sign extension is currently being silently ignored (if it's not causing other decoding issues).
Apart from removing debug code, there's still a good bit of optimization needed though. DOOM Eternal was running at 15-40fps at 1920x1080, mostly around 20fps, and while that's just about playable, GPU utilization was as low as 40% sometimes (though it also went up to 90%) while the CPU was basically pinned at 100% on all cores. This card didn't run the game too well even in my PC, but it was better.
I still need to figure out why some unity games are crashing (but also not all, some work), but I'd also be interested if someone has other things they'd like to be tested (I still need to try geekbench), as I've been mostly just trying to run games so far.
One way to speed things up may be to let the CPU handle the unaligned accesses automatically, although I didn't do this so far as it comes with a few issues. Memory mapped as device memory (which it is when it's mapped as uncached in the amdgpu driver) doesn't allow unaligned accesses and causes a bus error (this is how I've been doing it so far). If the memory is mapped as write-combined, this works, but accesses may be split up, which doesn't work for the kernel driver.
What has kinda worked is keeping BOs created by the kernel as uncached device memory, and mapping BOs created by userspace as write-combined. The desktop works fine that way, but not everything.
Steam seems a bit more buggy (though it's hard to tell if that's related), but doom eternal just runs slower. There's less CPU load, but the GPU is shown as 100% utilized (both in DOOM and in radeontop). My guess is that something related to synchronization is not getting updated as quickly, wasting time. Some finer control about which buffers get mapped in which way might help with that, but that'd have to be done in mesa (maybe in the gallium driver, or maybe in radv, I'm not sure on that).
The Talos Principle just didn't work very well at all anymore (my Monitor was just showing an Out Of Range error, even after closing the game), so this is currently definitely not a stable thing. It might be quite helpful in a few places though (if it's possible to properly do it at all without causing too many issues).
@Coreforge - Just popping into this thread as I'm going to take another stab with the RX 460 after seeing Pineboards running 4K gaming demos on theirs last month.
Did you or anyone else work on rebasing your branch on rpi-6.6.y
, which is the current stable branch for Pi OS? (cc @mikegapinski)
I haven't tried it, but if not too much changed in the relevant parts of the amdgpu driver, it shouldn't be too complicated to do.
I'm rebasing coreforge/rpi-6.1.y-gpu-pi5
on top of rpi-6.6.y
; there are some conflicts, so I'll step through them and see if I can get it working.
I can probably also give it a try tomorrow if you run into any snags.
I also recently got an orange pi 5 and should get an M.2 adapter for it in a few weeks, so I can then see if there's any relevant differences there (I'm mostly just expecting some improvements in regards to CPU performance, and having double the RAM should help too).
There were... a lot of conflicts. Changing gears to running on the latest 6.1.y, which allows an easier rebase. Here's the patch:
6.1.y-gpu-pi5.patch — rebased here, recompiling now: https://github.com/geerlingguy/linux/pull/7
Looks like I moved everything to 6.7.y at some point and just didn't upload it (last time I touched it was for some raspberry pimax shenanigans). I didn't do a proper rebase though, so I'll check if I made any relevant changes (mostly in the alignment trap) and then do a proper rebase to 6.6.y.
@Coreforge - Thanks! I'm trying to get a demo set up of SuperTuxKart at 4K for a livestream I'm working on tomorrow (the point of the livestream is mostly to talk about various PCIe things, not just GPU), but I figured it would be good to update people on the state of the patches, and how little is required at this point to get basic AMD graphics working.
Got it properly rebased to 6.6.y now.
Make sure to enable Kernel Features -> Kernel support for 32-bit EL0 -> Fix up misaligned multi-word loads and stores in user space
, as otherwise, the alignment trap doesn't get compiled, but since I haven't added the necessary preprocessor macros yet, the linker will then fail.
@Coreforge - Do I still need to patch in the memcpy library before recompiling?
I fetched coreforge/rpi-6.6.y-gpu
and generated a new patch against rpi-6.6.y
(attached below). Applied and recompiling now...
Just to recap, here's what I did:
rpi-6.6.y
export WLR_NO_HARDWARE_CURSORS=1
to my Pi user's ~/.profile
And it seems that everything's working stable. Doom 3 and SuperTuxKart aren't really hammering the GPU though, I wonder if there's some bottleneck?
lspci is showing Gen 3 speeds:
LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x1 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
I still have it preloaded on my pi. It might not be absolutely necessary (though I haven't done any testing i quite a while), but there was an issue with xorg without it if I remember correctly.
It should also be a lot more performant than relying on the alignment trap.
@Coreforge Do you remember what kind of FPS you were getting with SuperTuxKart (if you had tested that)? It seems like I'm still stuck at 17fps windowed, which is the same as it was on the Pi's internal GPU. Running glmark2-es2
, I'm seeing radeontop
hitting 50-80% GPU utilization—supertuxkart wasn't doing that, it was 10-20%.
Judging by Pineboards' speed, my run isn't using the GPU. My display output is through the HDMI port on the card.
Update: glmark2-es2
defaults gave me 1761 (I re-ran it on the Pi 5 without the GPU running, and got 1812)... I just re-checked and it's running on the Broadcom V3D!
Just tested it again, I got 30-60fps, full screen at 4k. I didn't add any special options though. Can you check the log output if it's using the right GPU? Also maybe check the CPU utilization if it is.
Weird - glmark2-es2 was also running on the internal GPU, not on the AMD GPU, even connected through the GPU.
Here's dmesg
:
pi@pi5-pcie:~ $ dmesg | grep amdgpu
[ 5.097704] [drm] amdgpu kernel modesetting enabled.
[ 5.097858] amdgpu 0000:01:00.0: enabling device (0000 -> 0002)
[ 5.339556] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 5.339567] amdgpu: ATOM BIOS: 113-BAFFIN_PRO_160513_D5_4GB_MIC_0_W82
[ 5.339612] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 5.339616] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
[ 5.823739] amdgpu 0000:01:00.0: BAR 2: releasing [mem 0x1810000000-0x18101fffff 64bit pref]
[ 5.823745] amdgpu 0000:01:00.0: BAR 0: releasing [mem 0x1800000000-0x180fffffff 64bit pref]
[ 5.823776] amdgpu 0000:01:00.0: BAR 0: assigned [mem 0x1800000000-0x18ffffffff 64bit pref]
[ 5.823784] amdgpu 0000:01:00.0: BAR 2: assigned [mem 0x1900000000-0x19001fffff 64bit pref]
[ 5.823802] amdgpu 0000:01:00.0: amdgpu: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 5.823805] amdgpu 0000:01:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 5.823891] [drm] amdgpu: 4096M of VRAM memory ready
[ 5.823894] [drm] amdgpu: 4026M of GTT memory ready.
[ 5.850612] amdgpu: hwmgr_sw_init smu backed is polaris10_smu
[ 6.364917] amdgpu 0000:01:00.0: amdgpu: SE 2, SH per SE 1, CU per SH 8, active_cu_number 14
[ 6.368499] amdgpu 0000:01:00.0: amdgpu: Using BACO for runtime pm
[ 6.368800] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:01:00.0 on minor 2
[ 6.391209] amdgpu 0000:01:00.0: [drm] fb0: amdgpudrmfb frame buffer device
I'm just using the default Pi 5 OS install though, with Wayland. Is there anything I need to do to force it to use the AMD GPU?
I've installed mesa-utils
, and here's glxinfo
:
pi@pi5-pcie:~ $ DISPLAY=:0 glxinfo
name of display: :0
display: :0 screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.4
...
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
...
Extended renderer info (GLX_MESA_query_renderer):
Vendor: Broadcom (0x14e4)
Device: V3D 7.1 (0xffffffff)
Version: 23.2.1
Accelerated: yes
Video memory: 8052MB
Unified memory: yes
Preferred profile: core (0x1)
Max core profile version: 3.1
Max compat profile version: 3.1
Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.1
OpenGL vendor string: Broadcom
OpenGL renderer string: V3D 7.1
OpenGL core profile version string: 3.1 Mesa 23.2.1-1~bpo12+rpt3
Hmm... how do I get OpenGL to use amdgpu?
I'm on xorg, could that have something to do with it (though I tried wayland a while ago too)?
Check if there are multiple entries in /dev/dri
.
xrandr --listproviders
should also list multiple cards.
If both cards are available, DRI_PRIME=1
should select the second one.
Oddly enough though, on my system, the integrated GPU isn't available. That probably has something to do with
[ 0.348449] bcm2708_fb soc:fb: Unable to determine number of FBs. Disabling driver.
[ 0.356149] bcm2708_fb: probe of soc:fb failed with error -2
Just to confirm, checking the supertuxkart log file in ~/.config/supertuxkart/config-0.10
, I see:
[info ] [IrrDriver Logger]: ..:: Antarctica Rendering Engine 2.0 ::..
[info ] [IrrDriver Logger]: SDL Version 2.26.3
[info ] [IrrDriver Logger]: Using renderer: OpenGL ES 3.1 Mesa 23.2.1-1~bpo12+rpt3
[info ] [IrrDriver Logger]: Broadcom
...
[info ] IrrDriver: OpenGL version: 3.1
[info ] IrrDriver: OpenGL vendor: Broadcom
[info ] IrrDriver: OpenGL renderer: V3D 7.1
[info ] IrrDriver: OpenGL version string: OpenGL ES 3.1 Mesa 23.2.1-1~bpo12+rpt3
[info ] GLDriver: Explicit Attrib Location Present
[info ] GLDriver: ARB Uniform Buffer Object Present
Looking at /dev/dri
, I see:
$ ls /dev/dri
by-path card0 card1 card2 renderD128 renderD129
xrandr is not listing any providers:
# Note: I ran this on the display/keyboard itself too, not just over SSH, and I get "0" there too.
$ DISPLAY=:0 xrandr --listproviders
Providers: number : 0
Edit: Going to reboot with X11 instead of Wayland and see if that behaves any different.
Edit 2: Now... I get no video output on the AMD card's HDMI output.
Edit 3: Tried labwc
instead (third option in raspi-config
), and I get display output... but still no OpenGL via AMDGPU (still not showing more than 0
providers).
Try DRI_PRIME=1
and DRI_PRIME=2
then. Having card0
, card1
and card2
there is a good sign, as well as renderD129
, as that looks like it's just both GPUs working and the default one for some reason being the internal one.
@Coreforge - I've tried DRI_PRIME=1 supertuxkart
in the terminal, but is that enough to force it? (Also tried DRI_PRIME=2
but same result - still using Broadcom). Do I need to set that up as an environment variable or system-wide?
pi@pi5-pcie:~ $ DRI_PRIME=1 DISPLAY=:0 glxinfo | grep "OpenGL renderer"
OpenGL renderer string: V3D 7.1
pi@pi5-pcie:~ $ DRI_PRIME=2 DISPLAY=:0 glxinfo | grep "OpenGL renderer"
OpenGL renderer string: V3D 7.1
Completely aside: It's very neat that nvtop
works with AMD graphics cards — nicer display than radeontop
!
The RX 460 is a Polaris era AMD GPU. @Coreforge did a good amount of work getting one running, documented in #6.
We broke out this separate issue since the original RX 550 issue is already a bit long, and we are both testing on a Raspberry Pi 5 now, where this card may have more opportunity to shine.
Using Coreforge's 6.1.x kernel fork, if you recompile the kernel, you'll end up with a working HDMI output, with working console output:
I have not been able to get wayfire/lightdm working (it sits there on a blinking cursor screen, and the
wireplumber
process seems to get stuck on something under thelightdm
user. Coreforge was running with X11 and seemed to be able to runglmark2
, Minecraft, Portal 1 and 2, and some other games, but currently is running with a PCIe x1 Gen 1 connection.