geerlingguy / raspberry-pi-pcie-devices

Raspberry Pi PCI Express device compatibility database
http://pipci.jeffgeerling.com
GNU General Public License v3.0
1.55k stars 140 forks source link

Test GPU (ASRock Rack M2_VGA, based on SM750) #62

Closed geerlingguy closed 2 years ago

geerlingguy commented 3 years ago

This is a fun one—it's definitely not the kind of GPU where people would ask "will it run Crysis", especially since a headline feature is 2D graphics acceleration (not even 3D yet!).

But the ASRock Rack M2_VGA is an M.2 form factor graphics card that sports a lone VGA port and 16MB (yeah, MB, not GB) of DDR graphics memory.

asrock-rack-m2-vga

I doubt it will even be as fast as the built-in graphics on the Pi, but it would be interesting to see if it works. It uses the SiliconMotion SM750 graphics chip, which actually supports up to two DVI/HDMI/VGA displays, as well as two video inputs which can be overlaid on those outputs.

The chip is mostly known for being helpful in embedded or server graphics situations, and is not a 'powerhouse' by any means. Just a little utilitarian chip that sips less than 2W of power maximum (making it suitable for lower-power scenarios where you still need a display or two, but don't do gaming or ML/AI applications on it).

It seems like there's a mainline driver since a few years ago (SM750), and it would be interesting to see if it 'just works' (compared to the other cards). It seems like the chip itself uses BIOS (and was designed in 2012), so that gives me a little pause.

But the chip is simple enough and documented enough that I wonder if we could bring it up manually if the driver starts barfing on memory allocations like all the other GPUs I've tested have (AMD and Nvidia).

geerlingguy commented 2 years ago

@TobleMiner - Wow, thanks for taking up the mantle on this one—the write-combining fix is something I didn't even think to look for (most of the things I've been testing relate to switching to safer memcpy routines (e.g. what @Coreforge is doing with the Radeon driver in https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/4). Interesting... I need to pull out this card again now!

Coreforge commented 2 years ago

If you can find out something via JTAG, I'd be interested in that too. From what I've seen when debugging over JTAG with a second pi as a JTAG adapter, the cpu somehow goes into 32bit mode and I can't connect to it anymore. I haven't tried seeing what exactly happens when it crashes though. Maybe there are other parts than the cpu in the SoC that can be accessed over JTAG too?

TobleMiner commented 2 years ago

I've just done some more tests with an Intel e1000e network card and a simple test driver that does nothing but map a single memory bar from that card. It then tries to perform writes of various sizes to that bar. Behaviour is much the same as with the SM750. Everyting up to and including 256 byte works fine. Above that e.g. at 512 bytes the Pi freezes.
Think this might be either an issue with (configuration of) the bus interface of the Broadcom PCIe host controller or the PCIe host controller itself?
Repeated those tests on the latest rpi-firmware release with Linux 5.15.24 for completeness sake, same result.

paulwratt commented 2 years ago

IIRC "write_combined" is a no-no on RPi PCIe (ie every time it has been found to be used, it is also found to be a problem).

Wasn't one of the work-arounds in previous #4 experiments a function that broke data down to 256 (4?) byte max data transactions?

With the CPU moving to 32bit, is that a result of the Driver (controller configurations), or a compiler issue using the wrong op-code (from when ARM was still only 32bit)?

Maybe unrelated: Haven't various (Intel?) drivers contained "kludge" work-arounds for 32bit x86?

Coreforge commented 2 years ago

The 256 byte limit might be because the pi might just not write combine under 256 bytes, but it could also be something else overflowing. Any transactions over 4 bytes don't work reliably though, but don't necessairly result in the pi locking up (reading 8 bytes just repeats the first 4 bytes for example). I don't know why the cpu does what it does, and it also might not have been a switch to 32 bits, I'm not sure anymore (been a while since I've used JTAG on the pi). I couldn't connect to it anymore though through OpenOCD, and I think it said something about Armv7. I didn't try to single step to a lockup though (might be easier with the test driver).

paulwratt commented 2 years ago

(I wish I was in a position to help out with some of this diagnosis)

TobleMiner commented 2 years ago

I opened an issue detailing this behaviour on raspberrypi/linux and got a response from a former Broadcom, now Raspberry Pi foundation engineer: https://github.com/raspberrypi/linux/issues/4928#issuecomment-1059839682

It seems to boil down to the fact that the PCIe root complex on the BCM2711 can only support aligned access up to 32bit. So remapping PCIe BARs as normal memory will never work properly. No one expects normal memory to have any alignment requirements. Thus a whole bunch of PCI drivers and also userspace software using those drivers will just not work on the Pi 4 without rewriting them to do only aligned accesses. That would be a huge amount of work though and not useful beyond just "making it work" on the Pi.

pelwell commented 2 years ago

the BCM2711 can only support aligned 32bit access.

It supports aligned accesses up to 32 bits. The linked comment has been amended.

TobleMiner commented 2 years ago

Ah, sorry for the confusion. I've now edited the comment to reflect that, thanks for bringing it up!

geerlingguy commented 2 years ago

That would be a huge amount of work though and not useful beyond just "making it work" on the Pi.

I don't think anyone has the idea it would be work that would be mainlined at any point, but there are some use cases where a particular driver or device is useful to get working on the Pi — for example people running storage controllers for disk storage on the Pi using old HBA cards — so it's nice to know all the corner cases where just a few lines of modified code will fix it.

I think for simpler graphics cards like SM750-based cards, it might be feasible to maintain a patch (especially considering the driver hasn't changed in years) that gives full or close to full functionality, for the few crazy people who want to use it (e.g. for adding more displays or using any of the 2D rendering built in). Heck, maybe some casino startup wants to start building Pi slot machines :D

geerlingguy commented 2 years ago

All right, so I managed to apply @TobleMiner's patch in this comment to the latest kernel source, and compile the kernel with CONFIG_FB_SM750=m (under Device Drivers -> Staging drivers -> Silicon Motion SM750 framebuffer support in menuconfig).

I tried booting an image with the full Pi OS and window manager, but when it initialized (with one HDMI display in HDMI0 and VGA connected to the M2_VGA card), it seemed to lock up. Got further than usual though. I'm going to try just a console version.

Coreforge commented 2 years ago

I don't think a window manager is fully working on any gpu yet. Instead of reflashing, just booting it up without the gpu and disabling graphical boot in raspi-config should be faster and has the same effect.

geerlingguy commented 2 years ago

@Coreforge - Heh, too late ;)

I was also working on my build script a tiny bit.

Before:

pi@m2:~ $ uname -a
Linux m2 5.10.92-v8+ #1514 SMP PREEMPT Mon Jan 17 17:39:38 GMT 2022 aarch64 GNU/Linux

After:

pi@m2:~ $ uname -a
Linux m2 5.15.28-v8+ #1 SMP PREEMPT Wed Mar 16 21:43:17 UTC 2022 aarch64 GNU/Linux

pi@m2:~ $ cat /sys/class/graphics/fb0/virtual_size
1024,768

IMG_0941

Pardon the crusty display. The other one I have with VGA input is in use but will soon be rotated out of that position!

geerlingguy commented 2 years ago

I noticed the SM750 driver has a number of memset_io calls that would need to be adjusted to work with the Pi SoC just like the other cards @Coreforge was working on (e.g. https://github.com/geerlingguy/linux/pull/1/files).

geerlingguy commented 2 years ago

Working on a patch here: https://github.com/geerlingguy/linux/pull/2 (so far it just has the fb console working as @TobleMiner had earlier).

geerlingguy commented 2 years ago

With just the patch from @TobleMiner I was getting a weird artifact after the blinking cursor in the console over VGA. I updated my patch (see link above) with a few more memset swaps, and it seems like those artifacts are gone. Going to try on X to see if I can get a desktop.

geerlingguy commented 2 years ago

Rebooting with X and the driver loaded results in the screen hanging at some point (prior to seeing any possible errors), so I added:

echo "blacklist sm750fb" | sudo tee /etc/modprobe.d/blacklist-sm750fb.conf

Then after boot, I ran:

sudo modprobe sm750fb

Dmesg shows:

[  142.774551] sm750fb: module is from the staging directory, the quality is unknown, you have been warned.
[  142.776466] no options.
[  142.777017] pci 0000:00:00.0: enabling device (0000 -> 0002)
[  142.777073] sm750fb 0000:01:00.0: enabling device (0000 -> 0002)
[  142.777112] sm750fb 0000:01:00.0: no specific g_option.
[  142.777129] mmio phyAddr = 604000000
[  142.777191] mmio virtual addr = 0000000054575e1d
[  142.777216] video memory phyAddr = 600000000, size = 16777216 bytes
[  142.777247] video memory vaddr = 000000009bddbcb5
[  143.369500] use simul primary mode
[  143.369519] crtc->cursor.mmio = 00000000f44c08d1
[  143.369565] ret = 5,fb_find_mode failed,with driver prepared modes
[  143.369575] success! use specified mode:1024x768-32@60 in kernel prepared default modedb
[  143.369582] Member of info->var is :
               xres=1024
               yres=768
               xres_virtual=1024
               yres_virtual=768
               xoffset=0
               yoffset=0
               bits_per_pixel=32
                ...
[  143.369591] fix->smem_start = 600000000
[  143.369597] fix->smem_len = 1000000
[  143.369603] fix->mmio_start = 604000000
[  143.369609] fix->mmio_len = 200000

Though the VGA input on my display shows nothing. via SSH, I see:

pi@m2x:~ $ sudo xrandr -q
Can't open display 

(Note: Within X, I can see xrandr -q outputting the HDMI-1 and HDMI-2 info, but not VGA.)

Module seems loaded:

01:00.0 VGA compatible controller: Silicon Motion, Inc. SM750 (rev a1) (prog-if 00 [VGA controller])
    Subsystem: Silicon Motion, Inc. SM750
    Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Interrupt: pin A routed to IRQ 63
    Region 0: Memory at 600000000 (32-bit, prefetchable) [size=64M]
    Region 1: Memory at 604000000 (32-bit, non-prefetchable) [size=2M]
    Expansion ROM at 604200000 [virtual] [disabled] [size=64K]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Address: 0000000000000000  Data: 0000
    Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
        DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
        DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 128 bytes, MaxReadReq 512 bytes
        DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
        LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us
            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
        LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR-
             10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
             AtomicOpsCtl: ReqEn-
        LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
             EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
             Retimer- 2Retimers- CrosslinkRes: unsupported
    Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
        Vector table: BAR=5 offset=00000000
        PBA: BAR=5 offset=00000000
    Capabilities: [100 v1] Advanced Error Reporting
        UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
        CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
    Capabilities: [140 v1] Virtual Channel
        Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
        Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:   ArbSelect=Fixed
        Status: InProgress-
        VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
            Status: NegoPending- InProgress-
    Kernel driver in use: sm750fb
    Kernel modules: sm750fb
geerlingguy commented 2 years ago

I switched 'Boot' to "To CLI" in Raspberry Pi Configuration and rebooted.

In CLI, xrandr -q still gives "Can't open display", and I can still modprobe the driver, though of course the display isn't active.

After I did that, I tried startx and got this interesting feedback in the Xorg log file:

[   337.295] (--) PCI:*(1@0:0:0) 126f:0750:126f:0750 rev 161, Mem @ 0x600000000/67108864, 0x604000000/2097152, BIOS @ 0x????????/65536
[   337.295] (II) LoadModule: "glx"
[   337.295] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[   337.297] (II) Module glx: vendor="X.Org Foundation"
[   337.297]    compiled for 1.20.11, module version = 1.0.0
[   337.297]    ABI class: X.Org Server Extension, version 10.0
[   337.297] (==) Matched siliconmotion as autoconfigured driver 0
[   337.297] (==) Matched modesetting as autoconfigured driver 1
[   337.297] (==) Matched fbdev as autoconfigured driver 2
[   337.298] (==) Assigned the driver to the xf86ConfigLayout
[   337.298] (II) LoadModule: "siliconmotion"
[   337.298] (WW) Warning, couldn't open module siliconmotion
[   337.298] (EE) Failed to load module "siliconmotion" (module does not exist, 0)
[   337.298] (II) LoadModule: "modesetting"
[   337.298] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
[   337.299] (II) Module modesetting: vendor="X.Org Foundation"
[   337.299]    compiled for 1.20.11, module version = 1.20.11
[   337.299]    Module class: X.Org Video Driver
[   337.299]    ABI class: X.Org Video Driver, version 24.1
[   337.299] (II) LoadModule: "fbdev"
[   337.299] (II) Loading /usr/lib/xorg/modules/drivers/fbdev_drv.so
[   337.299] (II) Module fbdev: vendor="X.Org Foundation"
[   337.299]    compiled for 1.20.0, module version = 0.5.0
[   337.299]    Module class: X.Org Video Driver
[   337.299]    ABI class: X.Org Video Driver, version 24.0
[   337.299] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[   337.299] (II) FBDEV: driver for framebuffer: fbdev
[   337.299] (WW) Falling back to old probe method for modesetting
[   337.300] (II) Loading sub module "fbdevhw"
[   337.300] (II) LoadModule: "fbdevhw"
[   337.300] (II) Loading /usr/lib/xorg/modules/libfbdevhw.so
[   337.300] (II) Module fbdevhw: vendor="X.Org Foundation"
[   337.300]    compiled for 1.20.11, module version = 0.0.2
[   337.300]    ABI class: X.Org Video Driver, version 24.1
[   337.300] (**) FBDEV(1): claimed PCI slot 1@0:0:0
[   337.300] (II) FBDEV(1): using default device
[   337.300] (II) modeset(G0): using drv /dev/dri/card1
[   337.300] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
[   337.300] (EE) Screen 0 deleted because of no matching config section.
[   337.300] (II) UnloadModule: "modesetting"
[   337.301] (II) FBDEV(0): Creating default Display subsection in Screen section
    "Default Screen Section" for depth/fbbpp 24/32
[   337.301] (==) FBDEV(0): Depth 24, (==) framebuffer bpp 32
[   337.301] (==) FBDEV(0): RGB weight 888
[   337.301] (==) FBDEV(0): Default visual is TrueColor
[   337.301] (==) FBDEV(0): Using gamma correction (1.0, 1.0, 1.0)
[   337.301] (II) FBDEV(0): hardware: sm750_fb1 (video memory: 16384kB)
[   337.301] (EE) 
[   337.301] (EE) Backtrace:
[   337.304] (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x188) [0x5588eef1a8]
[   337.304] (EE) unw_get_proc_info failed: no unwind info found [-10]
[   337.304] (EE) 
[   337.305] (EE) Segmentation fault at address 0x124
[   337.305] (EE) 
Fatal server error:
[   337.305] (EE) Caught signal 11 (Segmentation fault). Server aborting
[   337.305] (EE) 
[   337.305] (EE) 
geerlingguy commented 2 years ago

lol someone else with a similar issue gave up trying to get an SM501 working and swapped over to an Nvidia GT710.

After reading this post on the SM710, I also tried sudo startx but same problem.

Someone else was having a similar error when running tigervnc and had to append an LD_PRELOAD path to their vnc server command (https://github.com/TigerVNC/tigervnc/issues/800#issuecomment-565669421), but I tried that and no difference. Still a Segmentation fault at address 0x124, probably some memory copy/access that's still broken in code somewhere.

geerlingguy commented 2 years ago

Huh... same error but on a different GPU, the core inside the Rockchip RK3399, here: https://forum.armbian.com/topic/12985-potential-opp-issue-with-nanopi-m4v2/#comment-95405

Update: X11 fails to start, consistently segfaulting at OsLookupColor+0x188.

PixlRainbow commented 2 years ago

Similar issue encountered on the rk356x, crashing in the same place. Something about a mismatch where one library expects 24 bit but another expects 32, and memory alignment issues introduced when the code was not being tested on RISC platforms.

https://gitlab.freedesktop.org/mesa/mesa/-/issues/6142

geerlingguy commented 2 years ago

@PixlRainbow - That would make sense :( — though that issue seems more aligned with issue #4 on the Radeon 5450 (though related possibly here).

geerlingguy commented 2 years ago

Just as an FYI, I also tested with https://gist.github.com/Coreforge/91da3d410ec7eb0ef5bc8dee24b91359?permalink_comment_id=4134159#gistcomment-4134159 (a memcpy.so override that helped with Xorg on the Radeon 5450), but that had no effect. Still get the segfault at 0x124 when running OsLookupColor+0x188.

geerlingguy commented 2 years ago

Going to mark this as closed/complete, as the card is working about as far as I think we can expect without SiliconMotion getting involved, ideally writing a DRM driver for it instead of the ancient FB code that's currently in the kernel.

supercomputer7 commented 1 year ago

Is there a place to buy this card or a variant of this? Seems like a neat solution for lean graphics, and not only for the Raspberry Pi use case this issue solved.