geerlingguy / raspberry-pi-pcie-devices

Raspberry Pi PCI Express device compatibility database
http://pipci.jeffgeerling.com
GNU General Public License v3.0
1.59k stars 144 forks source link

Test GPU (Zotac Nvidia GeForce GT 710) #2

Closed geerlingguy closed 3 years ago

geerlingguy commented 4 years ago

I'm currently testing this GPU (Zotac Nvidia GeForce GT 710), so until that's done, I wanted an issue open to track where that work is going on.

DSC_2348

Currently I have the driver installed, and everything sorta works, but I've had to do a lot to get it to that state. Right now I'm installing the CUDA tools to see if I can get some programming done on the GPU. The X window system did not seem to like the driver for some reason.

Anyways, some relevant links to follow:

geerlingguy commented 4 years ago

Can't get simple CUDA examples to work either, as they trigger dmesg failure notices—see https://gist.github.com/geerlingguy/9f1510ab028e68b712381520308db2af#gistcomment-3500278

geerlingguy commented 4 years ago

Just posting as a point of reference, 96Boards mentioned getting an Nvidia 710 working with one of their own boards: https://www.96boards.org/blog/oxalis-gpu/

The reason behind choosing the Nvidia GT710 was that it was the only GPU I had ever seen running on Arm on the 96Boards DeveloperBox. Part of the reason it works well is that the DeveloperBox as SBBA/SBSA compliant UEFI bios that supports ACPI, and runs qemu to for emulating x86 and running the driver for the Nvidia GPU.

Looking at the DeveloperBox specs, though, it is a vastly different architecture than the current Pi:

PCIe: Two of 1 Lane per slot, one 16 Lanes slot: 1 for PCIe Graphics card, 1 for PCIe IO Extension, 2x USB3.0/2.0 + 2x SATA

And here's the time (15:14) in their Oxalis Enterprise Demo where Sahaj revealed the GT 710 working on their board: https://youtu.be/rPHahahbheo?t=914

geerlingguy commented 4 years ago

One other thing I'd like to try at some point—I just received a 1x to 16x adapter with external power, so I can supply power to the board without leeching off the IO Board's 12v 2A power supply. Might work better, probably won't :P

geerlingguy commented 4 years ago

It looks like some users of the ROCKPRO64 (using the RK3399, which has x4 PCIe) also attempted and failed to use a GPU on that flavor of the ARM SoC: Considering buying and question on the PCIe slot

geerlingguy commented 4 years ago

Some good reading here: PCI BARs and other means of accessing the GPU

NostalgiaRunner commented 4 years ago

Greetings Jeff, I recently saw your GPU on CM4 video and blog post and had an idea for a way to get either dGPU working. I don't know if this would be worth the rabbit trail, but If you're willing to try something a little less than on-metal, my idea is to use QEMU. QEMU utilizes the Linux kernel's KVM hypervisor and is able to emulate X86 and X86_64 operating systems on ARM64 hardware. In my previous ventures with ProxMox (VMWare ESXI's open-source QEMU based competition), I was able to passthrough a cheap GT 430 GPU to my very old BIOS age Xeon server. If you wish to try it, I think it'd be possible. Atleast on paper it is. Run QEMU on a headless minimal Pi image, create an x86 VM with your choice of x86 image, and configure the QEMU VM to passthrough the GPU device. Here's a link to the guide I used for ProxMox. Though I have my doubts that it'll be 100% the same process or success level. https://pve.proxmox.com/wiki/Pci_passthrough#GPU_Passthrough While this wouldn't be running the GPU on-metal, I think it would still be an incredible feat if the VM is able to utilize the card, as this would make the idea of custom ARM based desktop motherboards for x86 work a little more appealing. Some final notes are... there's a potential software dilemma. QEMU is and has been able to run x86 VMs on the RPi. However, it is not able to address all 8GB of RAM for the 8GB model, for use across all VMs. ESXI ARM Fling on the other hand is a free??? beta testing product at this time, and is able to address all 8GB on the host. However, it is my recollection that ESXI ARM Fling is unable to run x86 VMs, and potentially unable to passthrough PCIe devices. These are my thoughts and findings, and I hope they may be of use to you. Cheers!

geerlingguy commented 4 years ago

Over in the raspberrypi/linux project, it looks like this commit (https://github.com/raspberrypi/linux/commit/54db4b2fa4d17251c2f6e639f849b27c3b553939) has increased the default BAR allocation to 1GB by default—nice!

geerlingguy commented 3 years ago

Before I throw in the towel on this thing, I'm going to try to use the external powered PCIe switch to see if it makes any difference. I'll first try with Nouveau, by cross-compiling the kernel from my Mac.

geerlingguy commented 3 years ago

I'm not even seeing the card when it's plugged into the switch:

$ lspci
00:00.0 PCI bridge: Broadcom Limited Device 2711 (rev 20)
01:00.0 PCI bridge: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch (rev 05)
02:01.0 PCI bridge: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch (rev 05)
02:02.0 PCI bridge: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch (rev 05)

No errors in dmesg.

geerlingguy commented 3 years ago

And if I plug it straight in, I get pcie link down each time.

geerlingguy commented 3 years ago

And using my other powered PCIe adapter I get pcie link down as well. Darn. No dice with this card currently.

geerlingguy commented 3 years ago

All right, spinning this card up again once more using the expanded BAR space suggested in this comment: https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/6#issuecomment-728206583

And to make it so I can monitor what happens, I created /etc/modprobe.d/blacklist-nouveau.conf with the contents blacklist nouveau. That way I can power on the Pi, wait for successful boot, then run sudo modprobe nouveau.

geerlingguy commented 3 years ago

No matter what I do, pcie link down is always the result. Maybe I borked the card?

arrowmaster commented 3 years ago

ASUS has a similar board to this with twice the memory and four HDMI ports in exchange for only using PCIe Gen2. That may have alternative use cases for those that need the PCIe 1x physical size.

https://www.asus.com/Motherboards-Components/Graphics-Cards/ASUS/GT710-4H-SL-2GD5/

6by9 commented 3 years ago

@arrowmaster I have an Asus GT710-4H-SL-2GD5 in front of me. The nouveau driver blows up with it during initialisation. More details in https://www.raspberrypi.org/forums/viewtopic.php?f=98&t=288902

I had some advice from the dri-nouveau mailing list, but time isn't allowing me to play with it at present.

geerlingguy commented 3 years ago

Closing issues where testing is at least mostly complete, to keep the issue queue tidy.

(If someone else can get the GT 7210 to light up, I'd be happy to have a new thread on it!)

geerlingguy commented 1 day ago

Just wanted to confirm on Pi 5 I'm still getting kernel errors with this card and the nouveau drivers:

[    5.916841] nouveau 0000:01:00.0: DRM: VRAM: 1024 MiB
[    5.916850] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
[    5.916856] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[    5.916859] nouveau 0000:01:00.0: DRM: DCB version 4.0
[    5.916863] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000f02 00020030
[    5.916870] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f62 00020010
[    5.916873] nouveau 0000:01:00.0: DRM: DCB outp 02: 02022f10 00000000
[    5.916877] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001031
[    5.916880] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002161
[    5.916883] nouveau 0000:01:00.0: DRM: DCB conn 02: 00000200
[    5.919269] nouveau 0000:01:00.0: DRM: failed to initialise sync subsystem, -22
[    5.919943] Unable to handle kernel NULL pointer dereference at virtual address 0000000000003fc0
[    5.928802] Mem abort info:
[    5.931628]   ESR = 0x0000000096000145
[    5.935392]   EC = 0x25: DABT (current EL), IL = 32 bits
[    5.940727]   SET = 0, FnV = 0
[    5.943784]   EA = 0, S1PTW = 0
[    5.946947]   FSC = 0x05: level 1 translation fault
[    5.951844] Data abort info:
[    5.954729]   ISV = 0, ISS = 0x00000145, ISS2 = 0x00000000
[    5.960234]   CM = 1, WnR = 1, TnD = 0, TagAccess = 0
[    5.965311]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    5.970660] user pgtable: 16k pages, 47-bit VAs, pgdp=0000000107a44000
[    5.977217] [0000000000003fc0] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[    5.985967] Internal error: Oops: 0000000096000145 [#1] PREEMPT SMP
[    5.992256] Modules linked in: algif_hash algif_skcipher af_alg bnep spidev hci_uart btbcm bluetooth aes_ce_blk aes_ce_cipher ghash_ce gf128mul sha2_ce ecdh_generic sha256_arm64 ecc sha1_ce libaes raspberrypi_hwmon brcmfmac_wcc hid_apple nouveau(+) rpivid_hevc(C) pisp_be v4l2_mem2mem videobuf2_dma_contig i2c_brcmstb brcmfmac videobuf2_memops vc4 drm_exec spi_bcm2835 videobuf2_v4l2 brcmutil snd_soc_hdmi_codec cec videodev drm_dma_helper i2c_algo_bit cfg80211 drm_display_helper snd_soc_core gpio_keys snd_compress drm_ttm_helper snd_pcm_dmaengine ttm snd_pcm drm_kms_helper videobuf2_common mc rfkill snd_timer pwm_fan v3d snd gpu_sched drm_shmem_helper hid_multitouch joydev rp1_adc raspberrypi_gpiomem binfmt_misc nvmem_rmem uio_pdrv_genirq uio drm i2c_dev drm_panel_orientation_quirks backlight fuse dm_mod ip_tables x_tables ipv6
[    6.065732] CPU: 2 PID: 331 Comm: (udev-worker) Tainted: G        WC         6.6.58-v8-16k+ #12
[    6.074467] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[    6.080318] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    6.087305] pc : dcache_inval_poc+0x28/0x58
[    6.091505] lr : arch_sync_dma_for_cpu+0x34/0x50
[    6.096136] sp : ffffc000808eb720
[    6.099456] x29: ffffc000808eb720 x28: ffffd00044c99a78 x27: 0000000000000001
[    6.106618] x26: ffffc000808ebce0 x25: ffffc000808ebc90 x24: ffffffffffffffff
[    6.113781] x23: 0000000000000000 x22: ffff8001009998c0 x21: 0000000000004000
[    6.120943] x20: 0000000000000000 x19: ffffffffffffffff x18: ffffffffffffffff
[    6.128105] x17: 202c6d6574737973 x16: ffffd0008413de98 x15: 0000000000000100
[    6.135268] x14: ffff80010a3ab000 x13: 02dc8759a673a8ce x12: 0000000000000001
[    6.142430] x11: ffffd00044d04a28 x10: 0000000000000001 x9 : ffffd0008413e04c
[    6.149592] x8 : 0000000000000000 x7 : ffff800105526aa0 x6 : ffffffefffffffff
[    6.156754] x5 : ffff800105526aa0 x4 : 0000000000000000 x3 : 000000000000003f
[    6.163919] x2 : 0000000000000040 x1 : 0000000000003fc0 x0 : ffffffffffffffff
[    6.171090] Call trace:
[    6.173551]  dcache_inval_poc+0x28/0x58
[    6.177413]  dma_unmap_page_attrs+0x1b4/0x1d0
[    6.181789]  nvkm_fb_dtor+0xc0/0x108 [nouveau]
[    6.186380]  nvkm_subdev_del+0x78/0x128 [nouveau]
[    6.191209]  nvkm_device_del+0x80/0x130 [nouveau]
[    6.196037]  nouveau_drm_probe+0x154/0x230 [nouveau]
[    6.201127]  local_pci_probe+0x48/0xb8
[    6.204886]  pci_device_probe+0xac/0x1c8
[    6.208818]  really_probe+0x150/0x2c0
[    6.212489]  __driver_probe_device+0x80/0x140
[    6.216857]  driver_probe_device+0x44/0x170
[    6.221050]  __driver_attach+0x9c/0x1b0
[    6.224894]  bus_for_each_dev+0x80/0xe8
[    6.228740]  driver_attach+0x2c/0x40
[    6.232323]  bus_add_driver+0xec/0x218
[    6.236079]  driver_register+0x68/0x138
[    6.239924]  __pci_register_driver+0x54/0x68
[    6.244205]  nouveau_drm_init+0x214/0x3ff8 [nouveau]
[    6.249300]  do_one_initcall+0x60/0x2c0
[    6.253146]  do_init_module+0x60/0x218
[    6.256905]  load_module+0x1de0/0x2090
[    6.260663]  __do_sys_init_module+0x19c/0x1e0
[    6.265031]  __arm64_sys_init_module+0x24/0x38
[    6.269486]  invoke_syscall+0x50/0x128
[    6.273246]  el0_svc_common.constprop.0+0xc8/0xf0
[    6.277965]  do_el0_svc+0x24/0x38
[    6.281287]  el0_svc+0x40/0xe8
[    6.284347]  el0t_64_sync_handler+0x100/0x130
[    6.288716]  el0t_64_sync+0x190/0x198
[    6.292387] Code: d1000443 ea03003f 8a230021 54000040 (d50b7e21) 
[    6.298501] ---[ end trace 0000000000000000 ]---