RHSResearchLLC / NiteFury-and-LiteFury

Public repository for Litefury & Nitefury
273 stars 72 forks source link

Failed to detect XDMA config BAR #11

Closed spaceotter closed 2 years ago

spaceotter commented 3 years ago

I'm having some trouble with my NiteFury card and top-of-master XDMA. It works when plugged into a PCIe switch card: https://www.amazon.com/gp/product/B08L8J3MBT/ with some occasional reliability issues. But when plugged into the mother board with a M.2 adapter, the driver doesn't work.

lspci -d 10ee: -vvv ``` 4b:00.0 Serial controller: Xilinx Corporation Device 7024 (prog-if 01 [16450]) Subsystem: Xilinx Corporation Device 0007 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR-
Result of removing the card in /sys and rescanning ``` # echo "1" > /sys/bus/pci/devices/0000\:4b\:00.0/remove # echo "1" > /sys/bus/pci/rescan [Feb15 12:40] pci 0000:4b:00.0: Removing from iommu group 62 [Feb15 12:41] pcieport 0000:04:08.0: bridge window [mem 0x00100000-0x000fffff] to [bus 05] add_size 200000 add_align 100000 [ +0.000004] pcieport 0000:03:00.0: bridge window [mem 0x00100000-0x001fffff] to [bus 04-05] add_size 200000 add_align 100000 [ +0.000012] pcieport 0000:03:00.0: BAR 14: no space for [mem size 0x00300000] [ +0.000001] pcieport 0000:03:00.0: BAR 14: failed to assign [mem size 0x00300000] [ +0.000002] pcieport 0000:03:00.0: BAR 14: no space for [mem size 0x00100000] [ +0.000001] pcieport 0000:03:00.0: BAR 14: failed to assign [mem size 0x00100000] [ +0.000001] pcieport 0000:04:08.0: BAR 14: no space for [mem size 0x00200000] [ +0.000001] pcieport 0000:04:08.0: BAR 14: failed to assign [mem size 0x00200000] [ +0.000002] pcieport 0000:04:08.0: BAR 14: no space for [mem size 0x00200000] [ +0.000001] pcieport 0000:04:08.0: BAR 14: failed to assign [mem size 0x00200000] [ +0.018879] pci 0000:4b:00.0: [10ee:7024] type 00 class 0x070001 [ +0.000023] pci 0000:4b:00.0: reg 0x10: [mem 0xd0300000-0xd03fffff] [ +0.000010] pci 0000:4b:00.0: reg 0x14: [mem 0xd0400000-0xd040ffff] [ +0.000105] pci 0000:4b:00.0: PME# supported from D0 D1 D2 D3hot [ +0.000627] pci 0000:4b:00.0: Adding to iommu group 62 [ +0.000115] pci 0000:4b:00.0: BAR 0: assigned [mem 0xd0300000-0xd03fffff] [ +0.000004] pci 0000:4b:00.0: BAR 1: assigned [mem 0xd0400000-0xd040ffff] [Feb15 12:44] xdma:xdma_mod_init: Xilinx XDMA Reference Driver xdma v2020.1.8 [ +0.000002] xdma:xdma_mod_init: desc_blen_max: 0xfffffff/268435455, timeout: h2c 10 c2h 10 sec. [ +0.000071] xdma:xdma_device_open: xdma device 0000:4b:00.0, 0x0000000045520a28. [ +0.000001] xdma:alloc_dev_instance: xdev = 0x0000000030361898 [ +0.000003] xdma:xdev_list_add: dev 0000:4b:00.0, xdev 0x0000000030361898, xdma idx 0. [ +0.000130] xdma:request_regions: pci_request_regions() [ +0.000005] xdma:map_single_bar: BAR0: 1048576 bytes to be mapped. [ +0.000025] xdma:map_single_bar: BAR0 at 0xd0300000 mapped at 0x00000000519eaa09, length=1048576(/1048576) [ +0.000004] xdma:is_config_bar: BAR 0 is NOT the XDMA config BAR: 0xffffffff, 0xffffffff. [ +0.000001] xdma:map_single_bar: BAR1: 65536 bytes to be mapped. [ +0.000010] xdma:map_single_bar: BAR1 at 0xd0400000 mapped at 0x0000000071b088e5, length=65536(/65536) [ +0.000002] xdma:is_config_bar: BAR 1 is NOT the XDMA config BAR: 0xffffffff, 0xffffffff. [ +0.000001] xdma:map_bars: Failed to detect XDMA config BAR [ +0.000034] pcieport 0000:40:01.3: DPC: containment event, status:0x1f01 source:0x0000 [ +0.000002] pcieport 0000:40:01.3: DPC: unmasked uncorrectable error detected [ +0.025532] xdma:probe_one: pdev 0x0000000045520a28, err -22. [ +0.000003] xdma:xpdev_free: xpdev 0x00000000b9ed515b, destroy_interfaces, xdev 0x0000000000000000. [ +0.000001] xdma:xpdev_free: xpdev 0x00000000b9ed515b, xdev 0x0000000000000000 xdma_device_close. [ +0.000001] xdma:xdma_device_close: pdev 0x0000000045520a28, xdev 0x0000000000000000. [ +0.000006] xdma: probe of 0000:4b:00.0 failed with error -22 [ +0.135983] pcieport 0000:40:01.3: AER: Device recovery failed ```

I thought, maybe the debugger is holding it in reset. So I built the sample project in 2018.3, which went as expected, then I remove the pcie device, flash the FPGA and add it back. The result is exactly the same.

RHSResearchLLC commented 3 years ago

Eric,

I'm not 100% sure I understand what you are saying- does it work properly with 2018.3, or do both versions have issues?

I have one motherboard where I get a flaky connection if I build using Vivado 2020. So the bitstream that ships with the board is built using 2018 (I believe) and I've never had any issues with it. When you say M.2 adapter, which adapter are you using? I've tried several passive adapters and they all seem to work fine.

spaceotter commented 3 years ago

My previous post was only about 2018.3, but I tested 2020.2 as well. With the pcie switch card: 2018.3 worked. 2020.2 didn't Without the switch card, using an adapter (not sure what part number) like this one https://www.amazon.com/gp/product/B00MYCQP38 Neither version of vivado works. Pretty sure the failure mode is always as above. I could try testing the adapters with an SSD later.

spaceotter commented 3 years ago

The motherboard is an MSI Creator TRX40. I might have time on Wednesday to test combinations of adapter and maybe another computer. Does the trace above give any hint what might be going wrong?

RHSResearchLLC commented 3 years ago

Yes it looks like the config space is returning all 0xFFFF. I've seen this happen when the link drops. Even if the link recovers, it seems the config space doesn't.

It would be interesting to try the adapter with SSD. I've never had a problem linking up with adapter or without, when built with 2018 or 2019. I haven't had a chance yet to figure out what the deal is when using 2020. I've tried about 8 different motherboards on a LiteFury built with 2020, and all of them worked fine except for one.

spaceotter commented 3 years ago

Is the config space in the programmable logic? Why doesn't it get reset by the JTAG?