bperez77 / xilinx_axidma

A zero-copy Linux driver and a userspace interface library for Xilinx's AXI DMA and VDMA IP blocks. These serve as bridges for communication between the processing system and FPGA programmable logic fabric, through one of the DMA ports on the Zynq processing system. Distributed under the MIT License.
MIT License
472 stars 231 forks source link

Issues with the Benchmark #68

Open brianvg opened 6 years ago

brianvg commented 6 years ago

Hi!

As I understand the benchmark example program should also work for a VDMA engine looped back, correct? I have a design which contains only a VDMA engine with TX and RX looped back. The module loads ok, and both channels seem to be seen: [ 7.697740] axidma: axidma_dma.c: axidma_dma_init: 718: DMA: Found 0 transmit channels and 0 receive channels. [ 7.707679] axidma: axidma_dma.c: axidma_dma_init: 720: VDMA: Found 1 transmit channels and 1 receive channels.

When I run the benchmark code as below:

sudo ./axidma_benchmark -v -t 0 -r 1 -f 10x10x3 -g 10x10x3 -n 3

I get a segmentation fault.

In dmesg I see: [ 170.988718] axidma: axidma_dma.c: axidma_start_transfer: 301: VDMA receive transaction timed out.

Any suggestions as to where I can look for the problem?

Thanks!!

brianvg commented 6 years ago

On reviewing some of the other issues, is it possible this is related to not having the number of buffers set to 1 in the IP core? I rebuilt the design with a single buffer and still observed the same behavior.

brianvg commented 6 years ago

I am guessing, like you @bperez77, I just have the wrong settings in my IP core.

@Westwood68 Do you happen to remember/have the settings for the AXI VDMA IP you used to get the VDMA test to work in the kernel? I'm trying to get it up an running now, but I can't seem to find the right configuration.<

@Westwood68, could you let us know what settings you used to get the VDMA working correctly? Thanks!

bperez77 commented 6 years ago

Yeah this one was tricky, I still haven't figured out the proper settings on the IP core to get it to work. It may also be an issue inside the driver with how multiple VDMA buffers are set up, but I would still expect it to work for a single buffer case.

IIRC from that original thread, he didn't remember what the original settings were.

brianvg commented 6 years ago

Hi @bperez77, the program does indeed crash when the single test transfer is performed (single_transfer_test). Since making some changes to the IP core, I am now getting a null pointer exception, so I guess it might actually not be related to the driver at all.

It might also be that there is some significance to the fact that I am using a 64-bit platform (ZynqMP) instead of the 7-series Zynq.

brianvg commented 6 years ago

I guess this is the same problem as described in issue #52. I am not sure if I will get further than they did, but if I do, I will post my findings here.

brianvg commented 6 years ago

I have added some kprint statements to the module to see exactly where in the DMA transfer things go sideways... See below, where I added also the lines of code where the print statements happened:

`[ 124.105520] Running axidma_rw_transfer.464, axidma_chrdev.c [ 124.105523] Getting tx channel.[n473, axidma_dma.c] [ 124.105526] Getting rx channel.[n482, axidma_dma.c] [ 124.105528] Setting up SG table, tx.[n493, axidma_dma.c] [ 124.105532] Setting up SG table, rx.[n502, axidma_dma.c] [ 124.105534] Adding SG frame info, tx.[n524, axidma_dma.c] [ 124.105537] Adding SG frame info, rx.[n542, axidma_dma.c] [ 124.105539] Prepping transfer, tx.[n550, axidma_dma.c] [ 124.105550] Prepping transfer, rx.[n557, axidma_dma.c] [ 124.105556] Starting transfer, tx:[n566, axidma_dma.c] [ 124.105559] Flush all pending xfers:[n294, axidma_dma.c] [ 124.105567] All pending xfers cmplt:[n298, axidma_dma.c] [ 124.105570] Starting transfer, rx:[n573, axidma_dma.c] [ 124.105572] Flush all pending xfers:[n294, axidma_dma.c] [ 124.105579] All pending xfers cmplt:[n298, axidma_dma.c] [ 124.105637] Transfers started:[n580, axidma_dma.c] [ 124.105641] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 124.113663] pgd = ffffffc878423000 [ 124.117036] [00000000] pgd=00000008785b6003 [ 124.121110] , pud=00000008785b6003 [ 124.121114] , *pmd=0000000000000000

[ 124.121121] Internal error: Oops: 86000006 [#1] SMP [ 124.125965] Modules linked in: xilinx_axidma(O) [ 124.130479] CPU: 2 PID: 3707 Comm: axidma_benchmar Tainted: G B O 4.9.0-xilinx-v2017.4 #1 [ 124.139594] Hardware name: Mercury XU5 (DT) [ 124.143759] task: ffffffc87a91ce80 task.stack: ffffffc877f3c000 [ 124.149664] PC is at 0x0 [ 124.152177] LR is at 0x0 [ 124.154695] pc : [<0000000000000000>] lr : [<0000000000000000>] pstate: 40000145 [ 124.162075] sp : ffffffc877f3fd60 [ 124.165371] x29: 000000000000012c x28: ffffffc877f3c000 [ 124.170665] x27: ffffff8008962000 x26: 000000000000001d [ 124.175959] x25: 0000000000000123 x24: 0000000000000015 [ 124.181254] x23: 0000007fe08b1b50 x22: 0000007fe08b1b50 [ 124.186549] x21: ffffffc87aaab600 x20: ffffff8000963630 [ 124.191844] x19: ffffffc877f3fdb8 x18: 0000000000000001 [ 124.197139] x17: 0000007fae4c5290 x16: ffffff800819e2c0 [ 124.202439] x15: 0000000000010000 x14: 0000000000000000 [ 124.207733] x13: 0000000000000004 x12: 0000000000000195 [ 124.213028] x11: 0000000000000002 x10: 0000000000000195 [ 124.218323] x9 : 0000000000000001 x8 : ffffff8008d55648 [ 124.223618] x7 : 0000000000000000 x6 : 0000000055891196 [ 124.228913] x5 : ffffffc87ff9eba8 x4 : 0000000000000001 [ 124.234207] x3 : 0000000000000007 x2 : 0000000000000006 [ 124.239502] x1 : 0000000000000007 x0 : 0000000000000000

[ 124.246275] Process axidma_benchmar (pid: 3707, stack limit = 0xffffffc877f3c020) [ 124.253748] Stack: (0xffffffc877f3fd60 to 0xffffffc877f40000) [ 124.259472] fd60: ffffffc877f3fe00 ffffff800819dc14 ffffffc87a4f4200 ffffffc87a5a0558 [ 124.267290] fd80: ffffffc87a4f4200 0000007fe08b1b50 0000007fe08b1b50 0000000000000015 [ 124.275102] fda0: ffffffc877f3fe10 ffffff8008081298 000000009200004f 0000000000000001 [ 124.282914] fdc0: 0000007fae3fe000 0000000000007530 0000006400000064 0000000100000003 [ 124.290726] fde0: 0000007fae3f6000 0000000000007530 0000006400000064 0000000000000003 [ 124.298538] fe00: ffffffc877f3fe80 ffffff800819e304 ffffffc87a4f4200 0000000000000003 [ 124.306350] fe20: ffffffc87a4f4200 0000000080485706 0000007fe08b1b50 00000000004018a8 [ 124.314162] fe40: ffffffc877f3fe80 ffffff800818c68c ffffffc87a4bbb00 ffffffc87a4bbb00 [ 124.321975] fe60: 00000000131ad010 0000000000000030 ffffffc877f3fe80 ffffff800819e2e4 [ 124.329786] fe80: 0000000000000000 ffffff8008082ef0 0000000000000000 0000000000000000 [ 124.337599] fea0: ffffffffffffffff 0000007fae4c529c 0000000080000000 0000000000000000 [ 124.345411] fec0: 0000000000000003 0000000080485706 0000007fe08b1b50 0000000000000003 [ 124.353223] fee0: 0000007fe08b1c98 0000000000000001 0000007fe08b1b94 0000000000007530 [ 124.361035] ff00: 000000000000001d ffffff80ffffffc8 0101010101010101 0000007fe08b1c00 [ 124.368847] ff20: 0000000000000000 0000000000000000 0000000000000000 0000007fae58d000 [ 124.376659] ff40: 0000007fae560090 0000007fae4c5290 0000000000000a03 0000000000402768 [ 124.384471] ff60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 124.392283] ff80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 124.400095] ffa0: 0000000000000000 0000007fe08b1af0 0000007fae54ef1c 0000007fe08b1af0 [ 124.407907] ffc0: 0000007fae4c529c 0000000080000000 0000000000000003 000000000000001d [ 124.415719] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 124.423529] Call trace: [ 124.425954] Exception stack(0xffffffc877f3fb90 to 0xffffffc877f3fcc0) [ 124.432379] fb80: ffffffc877f3fdb8 0000008000000000 [ 124.440197] fba0: ffffffc877f3fd60 0000000000000000 ffffffc877f3fbc0 00000000ffffffc8 [ 124.448009] fbc0: ffffffc87aaab600 0000000000000244 ffffff8000964180 0000000000000002 [ 124.455821] fbe0: 0000000000000000 0000000000000002 0000000000000000 ffffffc87ff9ec00 [ 124.463633] fc00: 000000000000012c 0000000000000000 ffffffc877f3fdb8 ffffff8000963630 [ 124.471445] fc20: ffffffc87aaab600 0000007fe08b1b50 0000000000000000 0000000000000007 [ 124.479257] fc40: 0000000000000006 0000000000000007 0000000000000001 ffffffc87ff9eba8 [ 124.487069] fc60: 0000000055891196 0000000000000000 ffffff8008d55648 0000000000000001 [ 124.494882] fc80: 0000000000000195 0000000000000002 0000000000000195 0000000000000004 [ 124.502694] fca0: 0000000000000000 0000000000010000 ffffff800819e2c0 0000007fae4c5290 [ 124.510504] [< (null)>] (null) [ 124.515187] Code: bad PC value [ 124.518247] ---[ end trace 725126ff463380d9 ]---`

I sort of expected to find something like a failure to use the "copy to/from user" function or similar, but this is not the case. In fact it seems like the kernel actually completes the full DMA transfer function before getting the null pointer exception at the line where we return 0.

Any ideas where I could keep looking for the problem? I have tried a lot of combinations of settings for the VDMA engine, nothing seems to be helping much, aside from the fact that now I get a null pointer exception instead of a timeout condition...

Thanks!

EKjeldsen commented 6 years ago

I have been rigorously pursuing the solution to this issue in parallel and come to the same conclusion. The DMA transfer function is completing correctly and the null pointer exception is occurring at the very point that the axidma_rw_transfer(dev, &inout_trans) in the ioctl case statement for AXIDMA_DMA_READWRITE in axidma_chrdev.c.

I speculate that a conflict is occurring with the asynchronous signal being issued for transmit DMA completion. Because the axidma-benchmark application is blocking on the receive completion anyway, this signal is not needed for correct operation.

I did a quick test today on the Zedboard by commenting out the "send_sig_info( ... )" line in the axi_dma_callback function and confirmed that the axidma-benchmark for VDMA still worked fine.

I will have access again beginning on 9/24 to the ZynqMP ZCU102 board to test this out. If you can see if this makes a difference before then, please advise. If this turns out to be the problem, the fix should be implemented elsewhere in the driver so as not to impact cases where the signal is necessary.

EKjeldsen commented 6 years ago

Looking deeper into the axidma_callback( ... ) signal handler for the VDMA Tx transaction done signal, there is an assert( ... ) function being called. An assertion is not considered async-signal-safe. This could be the root of the problem but won't know for sure until testing on the ZynqMP.

brianvg commented 6 years ago

Hi @EKjeldsen, I am glad I am not the only one having this problem. I tried your suggestion of commenting out the send_sig_info function call in the dma callback. I added some printk statements to follow the timing. Interestingly, it does indeed seem to be crashing during the second time we enter the callback (presumably for the RX thread), since I see the entrance printk, and not the exit printk. Below are my added printk statements from a run of the transfer test so you can observe the sequencing:

[ 1315.592026] Getting tx channel.[n475, axidma_rw_transfer] [ 1315.597390] Getting rx channel.[n484, axidma_rw_transfer] [ 1315.602771] Setting up SG table, tx.[n495, axidma_rw_transfer] [ 1315.608588] Setting up SG table, rx.[n504, axidma_rw_transfer] [ 1315.614402] Adding SG frame info, tx.[n526, axidma_rw_transfer] [ 1315.620305] Adding SG frame info, rx.[n544, axidma_rw_transfer] [ 1315.626210] Prepping transfer, tx.[n552, axidma_rw_transfer] [ 1315.631858] Prepping transfer, rx.[n559, axidma_rw_transfer] [ 1315.637496] Starting transfer, tx:[n568, axidma_rw_transfer] [ 1315.643143] Flush all pending xfers:[n296, axidma_start_transfer] [ 1315.649215] All pending xfers cmplt:[n300, axidma_start_transfer] [ 1315.655286] Starting transfer, rx:[n575, axidma_rw_transfer] [ 1315.660927] Flush all pending xfers:[n296, axidma_start_transfer] [ 1315.667008] All pending xfers cmplt:[n300, axidma_start_transfer] [ 1315.673116] Entered axi-dma callback.:[n151, axidma_dma_callback] [ 1315.679160] NOT performing the send_sig_info...:[n160, axidma_dma_callback] [ 1315.686278] Entered axi-dma callback.:[n151, axidma_dma_callback] [ 1315.692362] Transfers started:[n582, axidma_rw_transfer] [ 1315.697651] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 1315.705726] pgd = ffffffc87a7cd000 [ 1315.709095] [00000000] *pgd=0000000877d3d003

I will let you know if I discover anything else today while I am working on it.

brianvg commented 6 years ago

Follow up: I was running very small transfers to test the engine. Out of curiosity I ran some normal, frame sized transfers using the following command:

sudo ./axidma_benchmark -v -f 1080x1920x3 -g 1080x1920x3 -n 3

Behavior here was quite a bit different. The program seems to freeze after a different kernel error, before the DMA transfers are even set up...

[ 66.925888] Entered axidma_ioctl fctn. [ 66.925893] Getting device pointer from the file. [ 66.925897] Entered axidma_ioctl fctn. [ 66.925898] Getting device pointer from the file. [ 66.925911] Entered axidma_ioctl fctn. [ 66.925912] Getting device pointer from the file. [ 66.926301] Unable to handle kernel paging request at virtual address ffffffc000100000 [ 66.934158] pgd = ffffffc87b981000 [ 66.937542] [ffffffc000100000] pgd=0000000000000000 [ 66.942297] , pud=0000000000000000

[ 66.942307] Internal error: Oops: 9600004f [#1] SMP [ 66.947151] Modules linked in: xilinx_axidma(O) [ 66.951666] CPU: 3 PID: 3370 Comm: axidma_benchmar Tainted: G B O 4.9.0-xilinx-v2017.4 #1 [ 66.960781] Hardware name: Mercury XU5 (DT) [ 66.964946] task: ffffffc87875a700 task.stack: ffffffc87afc4000 [ 66.970855] PC is at memset+0x1ac/0x1d0 [ 66.974844] LR is at dma_alloc+0xf8/0x2a0 [ 66.979007] pc : [] lr : [] pstate: 40000145 [ 66.986385] sp : ffffffc87afc7c30 [ 66.989682] x29: ffffffc87afc7c30 x28: 0000000000000008 [ 66.994975] x27: 0000007f7d0d2000 x26: 00000000005ef000 [ 67.000270] x25: ffffff8008cc34e0 x24: ffffff8008d457f8 [ 67.005565] x23: 0000000000000000 x22: ffffffc87b0aa010 [ 67.010860] x21: ffffffc878619a98 x20: ffffffc000100000 [ 67.016155] x19: 00000000005ef000 x18: 0000000000000a03 [ 67.021449] x17: 0000007f7d81b060 x16: ffffff8008087d10 [ 67.026750] x15: 0000000000000040 x14: 0000000000000001 [ 67.032044] x13: 0000000000000038 x12: ffffff8008d3e890 [ 67.037339] x11: ffffff8008d3e890 x10: ffffffbf00018820 [ 67.042634] x9 : 0000000000000000 x8 : ffffffc000100000 [ 67.047929] x7 : 0000000000000000 x6 : 000000000000003f [ 67.053223] x5 : 0000000000000040 x4 : 0000000000000000 [ 67.058518] x3 : 0000000000000004 x2 : 00000000005eefc0 [ 67.063813] x1 : 0000000000000000 x0 : ffffffc000100000

[ 67.070586] Process axidma_benchmar (pid: 3370, stack limit = 0xffffffc87afc4020) [ 67.078059] Stack: (0xffffffc87afc7c30 to 0xffffffc87afc8000) [ 67.083782] 7c20: ffffffc87afc7c80 ffffff8000960220 [ 67.091601] 7c40: ffffffc878619a80 ffffffc87a5f4d10 ffffffc87b0aa010 ffffffc87ace5200 [ 67.099413] 7c60: ffffffc878619a98 ffffff8008d457f8 ffffff8008cc34e0 0000000100000000 [ 67.107225] 7c80: ffffffc87afc7cf0 ffffff8008164104 ffffffc878610a80 ffffffc87aef9500 [ 67.115037] 7ca0: 0000000000000000 ffffffc87a1ae000 ffffffc87a5f4d10 00000000000000fb [ 67.122849] 7cc0: 00000000000005ef ffffffc87a4d19a0 0000000000000000 ffffffc87a1ae000 [ 67.130661] 7ce0: 0000000000000000 00000000005ef000 ffffffc87afc7d70 ffffff8008164580 [ 67.138473] 7d00: 00000000005ef000 0000007f7d0d2000 0000000000000003 0000000000000001 [ 67.146285] 7d20: ffffffc87a1ae000 ffffffc87aef9500 00000000000000fb 00000000000005ef [ 67.154097] 7d40: ffffff8008962000 ffffffc87afc4000 ffffffc878610a70 0000000000000000 [ 67.161910] 7d60: 00000000005ef000 fffffffffffffff4 ffffffc87afc7de0 ffffff800814d244 [ 67.169721] 7d80: ffffffc87aef9568 ffffffc87a1ae000 0000000000000000 00000000005eec00 [ 67.177534] 7da0: 0000000000000003 0000000000000001 0000000000000000 ffffff8008983000 [ 67.185346] 7dc0: ffffffc87afc7de0 ffffff800814d1f8 ffffffc87afc7e38 ffffffc87afc7e38 [ 67.193158] 7de0: ffffffc87afc7e40 ffffff80081621e8 00000000005eec00 ffffffc87a1ae000 [ 67.200970] 7e00: 0000000000000000 0000000000000003 0000000000000000 0000000000000001 [ 67.208782] 7e20: 00000000005eec00 0000000000005702 ffffffc87afc7e40 0000000000000000 [ 67.216594] 7e40: ffffffc87afc7eb0 ffffff8008087d28 0000000000000000 0000000000000000 [ 67.224406] 7e60: ffffffffffffffff 0000007f7d783cdc 0000000080000000 0000000000000015 [ 67.232218] 7e80: 0000000000000123 00000000000000de 0000000000000000 0000000000000000 [ 67.240030] 7ea0: ffffffffffffffff 0000007f7d78029c 0000000000000000 ffffff8008082ef0 [ 67.247842] 7ec0: 0000000000000000 00000000005eec00 0000000000000003 0000000000000001 [ 67.255654] 7ee0: 0000000000000003 0000000000000000 0000007fe8156218 0000000000000000 [ 67.263466] 7f00: 00000000000000de 0000000000000000 0101010101010101 0000000000000000 [ 67.271278] 7f20: 0000000000000000 0000000000000000 0000000000000000 0000007f7d848cc0 [ 67.279091] 7f40: 0000007f7d783cc8 0000007f7d81b060 0000000000000a03 0000000000402768 [ 67.286902] 7f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 67.294715] 7f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 67.302527] 7fa0: 0000000000000000 0000007fe8156370 0000007f7d8099f0 0000007fe8156370 [ 67.310339] 7fc0: 0000007f7d783cdc 0000000080000000 0000000000000000 00000000000000de [ 67.318151] 7fe0: 0000000000000000 0000000000000000 fffffffff7ffffff fffefbfffd9bffff [ 67.325960] Call trace: [ 67.328386] Exception stack(0xffffffc87afc7a60 to 0xffffffc87afc7b90) [ 67.334811] 7a60: 00000000005ef000 0000008000000000 ffffffc87afc7c30 ffffff80083de32c [ 67.342629] 7a80: 0000000000000100 00000000000006ef 0000000000000800 ffffff8008188210 [ 67.350441] 7aa0: ffffffc87afc4000 0000000000000800 0000000000000100 ffffffbf00003820 [ 67.358253] 7ac0: ffffffc87afc7ba0 ffffff8008188440 0000000000000100 ffffff8008d658f8 [ 67.366065] 7ae0: 0000000000000100 0000000000010000 00000000000005ef ffffff8008cd0bb0 [ 67.373877] 7b00: ffffffc000100000 0000000000000000 00000000005eefc0 0000000000000004 [ 67.381689] 7b20: 0000000000000000 0000000000000040 000000000000003f 0000000000000000 [ 67.389501] 7b40: ffffffc000100000 0000000000000000 ffffffbf00018820 ffffff8008d3e890 [ 67.397313] 7b60: ffffff8008d3e890 0000000000000038 0000000000000001 0000000000000040 [ 67.405124] 7b80: ffffff8008087d10 0000007f7d81b060 [ 67.409981] [] __memset+0x1ac/0x1d0 [ 67.415026] [] axidma_mmap+0xf8/0x388 [xilinx_axidma] [ 67.421613] [] mmap_region+0x38c/0x5a8 [ 67.426906] [] do_mmap+0x260/0x398 [ 67.431855] [] vm_mmap_pgoff+0x94/0xb8 [ 67.437150] [] SyS_mmap_pgoff+0xb0/0x228 [ 67.442619] [] sys_mmap+0x18/0x28 [ 67.447478] [] el0_svc_naked+0x24/0x28 [ 67.452774] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428) [ 67.458889] ---[ end trace 7b9aad28c11933be ]---

brianvg commented 6 years ago

By the way, @EKjeldsen, does the "vdmatest" kernel module from Xilinx work in your system? That kernel module also does not work in my system. Typical Xilinx.

I made some changes to my hardware design (address all 36 bits from dma engine) and now the built in VDMA test passes. Now "only" the original issue with the benchmark remains...

brianvg commented 6 years ago

@bperez77 : I have noticed in the character device driver, you have case statements for dma read, vdma read, dma write, vdma write, but then only a dma read and write (bidirectional). Are the additional frame housekeeping tasks performed in the respective vdma functions not necessary for the bidirectional case?

EKjeldsen commented 6 years ago

@brianvg - thank you for updating your findings. We will be looking further into this today and will post updates as well.

EKjeldsen commented 6 years ago

Found something interesting late today of interest. To rule out multiple buffer issues, I verified operation using a single frame buffer. On the Zedboard, Brandon's axidma_benchmark and Xilinx's vdmatest ran fine. On the ZCU102, vdmatest ran fine but axidma_benchmark encountered the NULL pointer exception.

However, I noticed that the value of "xlnx,num-fstores" for the VDMA IP core in the pl.dtsi file generated for the ZCU102 was double what it should be. In my original 3 frame buffer case, "xlnx,num-fstores" equals 0x6 vs, 0x3. In the 1 frame buffer case, "xlnx,num-fstores" equals 0x2 vs. 0x1.

In the case of the Zedboard, there was no discrepancy between the setting in the VDMA IP and the Petalinux generated value fox "xlnx,num-fstores".

Tomorrow we will test a version on the ZCU102 that corrects the "xlnx,num-fstores" to equal 0x1. Seems to be yet another bug in Petalinux (we are using 2017.4).

brianvg commented 6 years ago

Hi @EKjeldsen, I just ran a test with the corrected entry for the frame buffer. I do not observe any changes in behavior, still seeing the segmentation fault. Nice catch though! Not sure how Petalinux is getting that wrong...

brianvg commented 6 years ago

Hi @EKjeldsen ! Any new progress on the issue? I was exploring the possibility of leveraging V4L2, which is what many ZynqMP users end up doing to get video streams from user space working, but unfortunately, the framework does not seem to allow sending video into the PL for any use other than sending it to a display port/hdmi interface. This seems to rule out the very plausible use case of using the framework for a video coprocessor (sending the data down to the PL, and then retrieving it after processing). @bperez77's work really does seem to be the only show in town for creating a data path for this sort of a use case. It is typical of Xilinx's startling lack of imagination that they would not prioritize this as a typical use case and offer some measure of support. (please excuse the rant...)

EKjeldsen commented 6 years ago

We are still pursuing this effort, but haven’t posted anything lately because of dead ends. Our use case is the opposite direction, PL -> PS, for video source processing by the PS, but also what would seem to be a typical application. There is a very perplexing problem! One architecture difference with respect to the Zynq-7000 Zedbaord is the presence of an SMMU. I think I’ve finally convinced myself that the SMMU can be bypassed as this driver provides the necessary address translation. But I would appreciate your take on that. Yesterday I discovered that a switch to set the size of dma_addr_t to 64 bits was not being configured in Petalinux. I added a user_xx.cfg file containing the following entry: CONFIG_ARCH_DMA_ADDR_T_64BIT=y. I then verified that sizeof(dma_addr_t) was 8 bytes. Unfortunately the NULL pointer exception remained.

Today I’m going to look further into the possibility of a pointer arithmetic error. Also, a point of divergence to consider that his driver uses an interleaved DMA call for VDMA vs. scatter-gather call for regular DMA. Maybe something is being set incorrectly in that call when 64-bit bit addressing is used?

From: brianvg notifications@github.com Sent: Tuesday, October 02, 2018 1:27 AM To: bperez77/xilinx_axidma xilinx_axidma@noreply.github.com Cc: Kjeldsen, Erik H. Erik.Kjeldsen@gtri.gatech.edu; Mention mention@noreply.github.com Subject: Re: [bperez77/xilinx_axidma] Issues with the Benchmark (#68)

Hi @EKjeldsenhttps://github.com/EKjeldsen ! Any new progress on the issue? I was exploring the possibility of leveraging V4L2, which is what many ZynqMP users end up doing to get video streams from user space working, but unfortunately, the framework does not seem to allow sending video into the PL for any use other than sending it to a display port/hdmi interface. This seems to rule out the very plausible use case of using the framework for a video coprocessor (sending the data down to the PL, and then retrieving it after processing). @bperez77https://github.com/bperez77's work really does seem to be the only show in town for creating a data path for this sort of a use case. It is typical of Xilinx's startling lack of imagination that they would not prioritize this as a typical use case and offer some measure of support. (please excuse the rant...)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/bperez77/xilinx_axidma/issues/68#issuecomment-426153056, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Aenrph4DiBuhVnVpO6LvC3uuE0dIl1Aiks5ugvkKgaJpZM4WnKrY.

brianvg commented 6 years ago

Hi @EKjeldsen I may be mistaken, but my read of the SMMU is that it is disabled by default and even when enabled only has an effect when the relevant PL masters are set up correctly to use it. I have tried enabling it in the device tree, but I don't see any effect. Enabling the SMMU would in theory allow us to use virtual addresses for mapping memory locations from user space into the DMA engine. I guess one would need to map the VDMA to a VIO device type. It all looks very cool, and would save the memory copy that is otherwise necessary. See here and here . Also, I guess you need to add the smmu use statements to the device tree, after you figure out the stream ID of your AXI DMA interfaces. It is all explained on the wiki, but it sounds a bit iffy to be honest. It is not clear to me how you can guarantee contiguous memory address allocation for large buffers from user space.

bperez77 commented 6 years ago

Sorry @brianvg @EKjeldsen, I've been pretty busy lately and haven't been active. So from the above thread it seems like there's two separate issues:

  1. The callback function is causing a NULL pointer exception with the AXI DMA RW transfer.
  2. The driver is not correctly handling transfers that involve more than 1 frame buffers.

I don't think that the SMMU should have any influence on the correct operation of the driver, it is able to perfrom all the translations to physical addresses required by the IP.

I'll have some time to work on this this weekend. Can one of you guys send me the configuration for your VDMA IP (a screenshot should be sufficient)? I still haven't gotten to the phase where I get a successful VDMA transfer. For both the VDMA test driver and my benchmark program, the transfer times out. I'm also working off a Zybo board, so I won't be able to directly replicate what you guys have on the ZynqMP board.

P.S. Yeah @brianvg I understand your frustration. It's three years on from when I created this driver and Xilinx still doesn't have a driver that's usable directly from userspace programs. I imagine Xilinx has done it this way because they want to pigeonhole everyone into their SDSoC framework.

brianvg commented 6 years ago

@bperez77 I agree that it is all about moving people to SDSOC. Very annoying because they should not care what path people use to develop. Xilinx is ostensibly a hardware company. There are so many ways to do design input! Not everyone wants to pretend C++ is a good method to design programmable logic.

I am not at all certain that my configurations are correct, but I have attached screenshots.

image

image

Best Regards,

Brian

brianvg commented 6 years ago

BTW, this is just my latest configuration. I have played around a lot over time...

EKjeldsen commented 6 years ago

Here are our current VDMA setup screenshots. Thank you for looking at this further Brandon. Since you don't have any ZynqMP platform available, you could make suggestions for @brianvg and us to try on our hardware to further isolate the problem.

I have noticed that you declare a static pointer to the character device - "axidma_dev" to store this pointer on the stack. Is is possible that is getting corrupted? The other critical argument in the ioctl is the "axidma_inout_transaction" structure which is also stored on the stack. Could this be somehow be going out of scope? vdma setup 1 vdma setup 2

bperez77 commented 6 years ago

To come back to some of your guys' original posts. When the two way/read write transfer occurs, the callback will be invoked twice. For the TX/write transfer, the callback is invoked with wait equal to false, so it will try to send a signal. However, if no signal is registered, then it shouldn't trigger a signal. For the RX/read transfer, the callback function should be invoked with wait equal to false. In this case, it will instead complete the completion structure, which should unblock the kernel thread waiting on the RX transfer.

@EKjeldsen those are valid points about both of those structures. For the loopback test, even though those values are stored on the stack, they are gauranteed to remain in scope because the main kernel thread that invokes DMA transfer function will wait for the callback function to indicate that it can continue. So, the values will remain valid because the thread will be put to sleep while waiting. It's not clear from the code, but this is what should be happening.

If had to guess, it seems that the macro VALID_NOTIFY_SIGNAL isn't behaving correctly, and when send_sig_info is being invoked, the cb_data->process field isn't properly initialized, leading to the segmentation fault. Since commenting out send_sig_info seems to fix the issue, this should be the culprit.

I'm investigating more now.

bperez77 commented 6 years ago

I'm still not able to get the VDMA IP working correctly with a loopback test. I tried both of your configurations against Xilinx's VDMA test driver, but both still timeout for me.

Since I can't make more progress on this route, can you guys take your existing stack backtraces and determine the source line numbers as desribed in this StackOverflow post?

maikonadams commented 5 years ago

when I run my benchmark on PetaLinux , I get :+1: ./axidma_benchmark: error while loading shared libraries: libgcc_s.so.1: cannot open shared object file: No such file or directory

prob is more a petalinux issue.

honorpeter commented 5 years ago

@maikonadams how to add the driver's files to PetaLinux? There is a compilation problem while it is being used with Petalinux 20118.2.

I copy the drivers files axi_dma.c
axidma.h
axidma_chrdev.c
axidma_dma.c
axidma_of.c
to the /project-spec/meta-user/recipes-modules/axidma/files modify the driver.mk to makefile .but can't add the include file and the lib file. and compile error occors.