Xilinx / dma_ip_drivers

Xilinx QDMA IP Drivers
https://xilinx.github.io/dma_ip_drivers/
562 stars 415 forks source link

XDMA: driver is reading the user's BAR... #250

Open alexisfrjp opened 9 months ago

alexisfrjp commented 9 months ago

Why does the driver read the user BAR at address 0x2000 0x3000?! It shouldn't even try to access the user BAR at all. It breaks the whole Xilinx/AMD AXI interconnect since these addresses aren't mapped in my design and rresp=3.

Is this driver developed by interns?

FilipVaverka commented 8 months ago

That actually may solve huge headache I had with designs "randomly" not working for certain memory mappings. Looking at the code (now that I know what to look for), maybe its caused by the mechanism for automatic identification of XDMA config BAR in map_bars(...)? When XDMA_CONFIG_BAR_NUM is not defined it goes through all BARs potentially touching those two addresses in is_config_bar. Setting "config_bar_num" when calling "make" seems to define the macro and skip all that.

MischaBaars commented 8 months ago

Is this driver developed by interns?

Does this mean that you got the drivers compiled? On my system, they won't compile to begin with.

Best regards, Mischa Baars.

alexisfrjp commented 8 months ago

Is this driver developed by interns?

Does this mean that you got the drivers compiled? On my system, they won't compile to begin with.

Best regards, Mischa Baars.

Yes, it isn't that hard to fix it, 2-3 functions to change. Check the pull requests, lots of people like working for free for big companies.

MischaBaars commented 8 months ago

I've been looking through them, and I understand what you're saying, but I recently bought a brand new Opal Kelly XEM8310 with PCIe breakout board and I'd like to get things up and running.

I'm new to FPGA programming, so I do not know what QDMA, XDMA and XVSEC is, but I do know it's harder to fix than you now imply. Support for the mm_segment_t struct has been deprecated as of kernel 5.17.15, as you can see in Changelog-5.18. I am currently using kernel 6.6.8, but I'm still interested in the XVSEC code, as you might understand.

The QDMA and XDMA compilation was indeed easier to fix, but that doesn't mean that they work. They compile.

Are merge requests even accepted at all? The drivers ./README.md states that issues are disabled in github for these drivers.

FilipVaverka commented 8 months ago

I haven't used QDMA and XVSEC, but XDMA should work with these fixes on kernel 6.6.8. At least for simple things I use it for: writting user registers, DMA to/from device and legacy user device->host IRQ.

MischaBaars commented 8 months ago

I've compiled and booted a 5.17.15 kernel today. Both the QDMA and the XDMA drivers compile without warning or error. I do get the same error on both kernels, 5.17.16 and 6.6.8 (see attachments).

Tomorrow I'm going to try a 5.9.16 kernel, which still has support for get_fs()/set_fs() in arch/x86/include/asm/uaccess.h. It compiles but does not boot for now. I should be able to compile XVSEC once it boots.

2023011500 - xdma_mm_linux-5.17.15.log 2023122800 - xdma_mm_linux-6.6.8.log

FilipVaverka commented 8 months ago

Its probably worth to check/post also output from "dmesg" with logs from the kernel module to see if it properly detects the FPGA board and the design loaded on it.

alexisfrjp commented 8 months ago

I've compiled and booted a 5.17.15 kernel today. Both the QDMA and the XDMA drivers compile without warning or error. I do get the same error on both kernels, 5.17.16 and 6.6.8 (see attachments).

Tomorrow I'm going to try a 5.9.16 kernel, which still has support for get_fs()/set_fs() in arch/x86/include/asm/uaccess.h. It compiles but does not boot for now. I should be able to compile XVSEC once it boots.

2023011500 - xdma_mm_linux-5.17.15.log 2023122800 - xdma_mm_linux-6.6.8.log

everything is wrong with this repo, the kernel is even hardcoded in the makefile...

install: all
    @rm -f /lib/modules/5.15.0-67-generic/extra/xdma.ko
    @echo "installing kernel modules to /lib/modules/$(shell uname -r)/xdma ..."
    @mkdir -p -m 755 /lib/modules/$(shell uname -r)/xdma
    @install -v -m 644 *.ko /lib/modules/$(shell uname -r)/xdma
    @depmod -a || true

this driver is managed by interns, nobody checks anything

MischaBaars commented 8 months ago

My birthday :) karenx-xilinx - Sep 27, 2023

hinxx commented 8 months ago

FWIW, I've ran the xdma_mm.sh on my machine and it finished without issues test-6.7.0-rc1.txt . It is a dev box with 6.7.0-rc1 kernel.

MischaBaars commented 8 months ago

Oh... that's interesting! Thank you for your contribution!

Could it be you used different settings in Vivado? These are mine: (XEM8310 User's Manual - Section 4.1.1 and Section 4.1.2)?


Here I am, with a somewhat messed up screen resolution, but nonetheless:

[mjbaars1977@tp02 ~]$ uname -r
5.9.16

If no one knowns what XVSEC is, I'm now ready to give the compilation a try.

MischaBaars commented 8 months ago

Ok.

I got it to compile on the 5.1.20 kernel (the last kernel with a 'policy' in 'struct genl_ops' in include/net/genetlink.h"), but with the 'DMA/Bridge Subsystem for PCI Express' bitfile loaded, this is my output:

output of ./xvsecctl -b 0E -F 00 -l output of ./xvsecctl -b 0E -F 00 -l -v

Perhaps I have to dig a little deeper in time: PG194, is there anyone else interested in the XVSEC part of the dma_ip_drivers?

Then let me now test xdma_mm.sh against the 6.7.0-rc1 kernel.

MischaBaars commented 8 months ago

FWIW, I've ran the xdma_mm.sh on my machine and it finished without issues test-6.7.0-rc1.txt . It is a dev box with 6.7.0-rc1 kernel.

Still the same result: 2024011600 - xdma_mm_linux-6.7.0-rc1.log

Could it have something to do with the Vivado settings?

hinxx commented 8 months ago

Could it have something to do with the Vivado settings?

Could be, but I'm not the FPGA guy around here.. If you want to make screenshots of the Vivado DMA IP core dialogs with your settings I can share them with our FPGA guy here and maybe there is something that sticks out..

BTW, no idea what XVSEC is, never looked at it.

MischaBaars commented 8 months ago

I'm starting to get a little worried, because the Opal Kelly FrontPanel API is written by a very inexperienced programmer, copy right notice from 2004 and everything closed source, and now their hardware might have shortcomings as well. What development board are you using?

Ok. Exactly as prescribed by XEM8310 User's Manual - Section 4.1.1 and Section 4.1.2, here are the screenshots:

Screenshot from 2024-01-17 08-53-07 Screenshot from 2024-01-17 08-57-39 Screenshot from 2024-01-17 09-04-14 Screenshot from 2024-01-17 09-07-59 Screenshot from 2024-01-17 09-14-04 Screenshot from 2024-01-17 09-24-04

2024011700 - xdma_mm_linux-6.7.0-rc1.log

Thanks for looking into it. Apparently I can use some help :(

hinxx commented 8 months ago

What development board are you using?

I'm working with this board https://innovation.desy.de/technologies/microtca/boards/damc_fmc2zup/index_eng.html.

My FPGA colleague took a quick peek at your config and commented that these transfers of 8kB might be failing due to the amount of BRAM that is connected to DMA in the firmware. If for example 4kB BRAM would be used then the particular case of 8kB IO might overwrite its contents and the data integrity check would fail. I guess this hypotheses is fairly easy to check; connect a larger/smaller BRAM and see if the test fails at a different value.

Another thing that you could do is to hexdump the data sent/received when "data integrity FAILED!." is printed at https://github.com/Xilinx/dma_ip_drivers/blob/a93d4a4870e41d152b33aebb3f869eefb11aa691/XDMA/linux-kernel/tests/scripts_mm/io.sh#L123. That way you could see the received data pattern which might suggest a reason for failure.

In our case we have a DDR memory controller talking to 2GB of memory that the DMA talks to hence there is no issues with large DMA IOs.

Good luck!

MischaBaars commented 8 months ago

I guess this hypotheses is fairly easy to check; connect a larger/smaller BRAM and see if the test fails at a different value.

I'm as inexperienced with hardware, as they are with software and I'm also not really their firmware programmer, but I'll try to be happy with this fairly complicated piece of equipment to begin with. Still, 2004 is twenty years ago.

Thanks for the hints. I can do the second one myself.

MischaBaars commented 8 months ago

My FPGA colleague took a quick peek at your config and commented that these transfers of 8kB might be failing due to the amount of BRAM that is connected to DMA (in the firmware).

Re-customizing the IP in the DMA/Bridge subsystem for PCI Express IP Example Design by double clicking blk_mem_gen_1 in the IP Sources tab and changing the Port A Options, Port A Width and Port A Depth, changes the behavior of the xdma_mm.sh script. I now have an output file of different size.

Thanks!

In the mean time: Please test the following pre-merge-request on compilation errors (XDMA, QDMA and XVSEC) and run at least the xdma_mm.sh script, to see if everything still works the way it should. Only XVSEC should still give compilation errors, because it depends on deprecated functionality from kernel 5.1.20. If people want this code salvaged, now would be a good time to let me know.

Thanks again!

hinxx commented 8 months ago

changes the behavior of the xdma_mm.sh script. I now have an output file of different size.

Seems I need to buy that guy a beer ;).

Please test the following pre-merge-request on compilation errors

FWIW, this compiles for me, and the xdma_mm.sh test executes wo/ errors.

ShlomiOJungo commented 8 months ago

Why does the driver read the user BAR at address 0x2000 0x3000?! It shouldn't even try to access the user BAR at all. It breaks the whole Xilinx/AMD AXI interconnect since these addresses aren't mapped in my design and rresp=3.

Hi OP If your having issues with the Driver dev you may want to check out WinDriver https://jungo.com/windriver/#download. Its our driver development toolkit, it also has enhanced support for certain chipsets, such as Xilinx and many debugging tools.

march1993 commented 2 weeks ago

For a temporarily solution, try define XDMA_CONFIG_BAR_NUM to the index of your dma bar in xdma/libxdma.h.

For example, if you have both axi-lite and dma enabled, set XDMA_CONFIG_BAR_NUM to 1 image

Prandr commented 2 weeks ago

If I try to set config_bar_num (as suggested by @FilipVaverka) or fix the define (per @march1993) xdma0_user doesn't show up in /dev

march1993 commented 2 weeks ago

If I try to set config_bar_num (as suggested by @FilipVaverka) or fix the define (per @march1993) xdma0_user doesn't show up in /dev

I met the same problem. After reading the driver source code carefully, I found out that if I lower down the axi4-lite address space to a small size, for example 8k, the untouched driver would work smoothly.

Prandr commented 2 weeks ago

@march1993 Why is that? I wonder if it is possible to fix this then, because I would definitely need more than that. And also if this problem may have anything to do with failing lseek calls: https://adaptivesupport.amd.com/s/question/0D52E00006iHjV9SAK/illegal-seek-for-xdmauser-access?language=en_US https://adaptivesupport.amd.com/s/question/0D54U00008iVVZNSA4/xdma-cant-set-axi-address-because-lseek-always-fails?language=en_US

march1993 commented 2 weeks ago

@march1993

Why is that? I wonder if it is possible to fix this then, because I would definitely need more than that.

And also if this problem may have anything to do with failing lseek calls:

https://adaptivesupport.amd.com/s/question/0D52E00006iHjV9SAK/illegal-seek-for-xdmauser-access?language=en_US

https://adaptivesupport.amd.com/s/question/0D54U00008iVVZNSA4/xdma-cant-set-axi-address-because-lseek-always-fails?language=en_US

Search the marco XDMA_BAR_SIZE in the driver. It is 0x8000. The driver try to identify bars by their sizes.

FilipVaverka commented 2 weeks ago

If I try to set config_bar_num (as suggested by @FilipVaverka) or fix the define (per @march1993) xdma0_user doesn't show up in /dev

It seems that when "config_bar_num" is defined it entirely skips search for BAR used for "xdma0_user" and assumes it doesn't exist https://github.com/Xilinx/dma_ip_drivers/blob/d66f224c7c49a12e89bbddebbedd614ff49e8046/XDMA/linux-kernel/xdma/libxdma.c#L1658 It seems to work if you force BAR number (0) here: https://github.com/Xilinx/dma_ip_drivers/blob/d66f224c7c49a12e89bbddebbedd614ff49e8046/XDMA/linux-kernel/xdma/libxdma.c#L4149 at the same time. I don't know if it breaks something else though...