bperez77 / xilinx_axidma

A zero-copy Linux driver and a userspace interface library for Xilinx's AXI DMA and VDMA IP blocks. These serve as bridges for communication between the processing system and FPGA programmable logic fabric, through one of the DMA ports on the Zynq processing system. Distributed under the MIT License.
MIT License
464 stars 227 forks source link

can vdma loop work ? #31

Closed tangxu00 closed 6 years ago

tangxu00 commented 6 years ago

use dma ip ,the driver,examples worked well, but vdma cant:

Z-turn# ./axidma_benchmark -v AXI DMA Benchmark Parameters: Transmit Buffer Size: 7.91 Mb Receive Buffer Size: 7.91 Mb Number of DMA Transfers: 1000 transfers

Using transmit channel 0 and receive channel 1. axidma: axidma_dma.c: axidma_start_transfer: 298: VDMA receive transaction timed out. Failed to perform the AXI DMA read-write transfer: Timer expired

how can i use vdma driver?

bperez77 commented 6 years ago

The answer to that is maybe. The driver has support for VDMA, but I was never able to get it to work even in a simple loopback mode. As discussed in #15, I believe this may actually be because of a lack of support in the backend Xilinx driver for DMA. However, that may have changed since then. In issue #25, it seems that someone may have gotten AXI VDMA to work. Perhaps @yaobaishen can comment on this?

If you want some help debugging, can you send me your device tree entries for AXI VDMA. Also, can you send the output of dmesg immediately after you run the benchmark?

tangxu00 commented 6 years ago

thanks for your help, and i have emailed yaobaishen,but no reply. i used zynq 7010, and the leasted linux-4.9. that is my pl.dtsi: `/*

/ { axidma_chrdev: axidma_chrdev@0 { compatible = "xlnx,axidma-chrdev"; dmas = <&axi_vdma_0 0 &axi_vdma_0 1>; dma-names = "tx_channel", "rx_channel"; }; amba_pl: amba_pl {

address-cells = <1>;

    #size-cells = <1>;
    compatible = "simple-bus";
    ranges ;
    axi_vdma_0: dma@43000000 {
        #dma-cells = <1>;
        clock-names = "s_axi_lite_aclk", "m_axi_mm2s_aclk", "m_axi_mm2s_aclk", "m_axi_s2mm_aclk", "m_axi_s2mm_aclk";
        clocks = <&clkc 15>, <&clkc 15>, <&clkc 15>, <&clkc 15>, <&clkc 15>;
        compatible = "xlnx,axi-vdma-1.00.a";
        interrupt-parent = <&intc>;
        interrupts = <0 29 4 0 30 4>;
        reg = <0x43000000 0x10000>;
        xlnx,addrwidth = <0x20>;
        xlnx,flush-fsync = <0x1>;
        xlnx,num-fstores = <0x3>;
        dma-channel@43000000 {
            compatible = "xlnx,axi-vdma-mm2s-channel";
            interrupts = <0 29 4>;
            xlnx,datawidth = <0x18>;
            xlnx,device-id = <0x0>;
        };
        dma-channel@43000030 {
            compatible = "xlnx,axi-vdma-s2mm-channel";
            interrupts = <0 30 4>;
            xlnx,datawidth = <0x18>;
            xlnx,device-id = <0x1>;
        };
    };
};

}; and dmesg after run the benchmark: Z-turn# ./axidma_benchmark -v AXI DMA Benchmark Parameters: Transmit Buffer Size: 7.91 Mb Receive Buffer Size: 7.91 Mb Number of DMA Transfers: 1000 transfers

Using transmit channel 0 and receive channel 1. axidma: axidma_dma.c: axidma_start_transfer: 298: VDMA receive transaction timed out. Failed to perform the AXI DMA read-write transfer: Timer expired Z-turn# dmesg Booting Linux on physical CPU 0x0 Linux version 4.9.0-xilinx (osrc@osrc-virtual-machine) (gcc version 4.6.1 (Sourcery CodeBench Lite 2011.09-50) ) #1 SMP PREEMPT Sun Nov 19 19:34:06 CST 2017 CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=18c5387d CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache OF: fdt:Machine model: xlnx,zynq-7000 cma: Reserved 28 MiB at 0x3e400000 Memory policy: Data cache writealloc On node 0 totalpages: 262144 free_area_init_node: node 0, pgdat c0a31500, node_mem_map ef7f8000 Normal zone: 1536 pages used for memmap Normal zone: 0 pages reserved Normal zone: 196608 pages, LIFO batch:31 HighMem zone: 65536 pages, LIFO batch:15 percpu: Embedded 14 pages/cpu @ef7d3000 s25984 r8192 d23168 u57344 pcpu-alloc: s25984 r8192 d23168 u57344 alloc=14*4096 pcpu-alloc: [0] 0 [0] 1 Built 1 zonelists in Zone order, mobility grouping on. Total pages: 260608 Kernel command line: console=ttyPS0,115200 root=/dev/ram rw earlyprintk cma=25M PID hash table entries: 4096 (order: 2, 16384 bytes) Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 994776K/1048576K available (6144K kernel code, 200K rwdata, 1468K rodata, 1024K init, 230K bss, 25128K reserved, 28672K cma-reserved, 233472K highmem) Virtual kernel memory layout: vector : 0xffff0000 - 0xffff1000 ( 4 kB) fixmap : 0xffc00000 - 0xfff00000 (3072 kB) vmalloc : 0xf0800000 - 0xff800000 ( 240 MB) lowmem : 0xc0000000 - 0xf0000000 ( 768 MB) pkmap : 0xbfe0random: fast init done 0000 - 0xc0000000 ( 2 MB) modules : 0xbf000000 - 0xbfe00000 ( 14 MB) .text : 0xc0008000 - 0xc0700000 (7136 kB) .init : 0xc0900000 - 0xc0a00000 (1024 kB) .data : 0xc0a00000 - 0xc0a32100 ( 201 kB) .bss : 0xc0a32100 - 0xc0a6bb1c ( 231 kB) Preemptible hierarchical RCU implementation. Build-time adjustment of leaf fanout to 32. RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2. RCU: Adjusting geometry for rcu_fanout_leaf=32, nr_cpu_ids=2 NR_IRQS:16 nr_irqs:16 16 efuse mapped to f0800000 slcr mapped to f0802000 L2C: platform modifies aux control register: 0x72360000 -> 0x72760000 L2C: DT/platform modifies aux control register: 0x72360000 -> 0x72760000 L2C-310 erratum 769419 enabled L2C-310 enabling early BRESP for Cortex-A9 L2C-310 full line of zeros enabled for Cortex-A9 L2C-310 ID prefetch enabled, offset 1 lines L2C-310 dynamic clock gating enabled, standby mode enabled L2C-310 cache controller enabled, 8 ways, 512 kB L2C-310: CACHE_ID 0x410000c8, AUX_CTRL 0x76760001 zynq_clock_init: clkc starts at f0802100 Zynq clock init sched_clock: 64 bits at 333MHz, resolution 3ns, wraps every 4398046511103ns clocksource: arm_global_timer: mask: 0xffffffffffffffff max_cycles: 0x4ce07af025, max_idle_ns: 440795209040 ns Switching to timer-based delay loop, resolution 3ns clocksource: ttc_clocksource: mask: 0xffff max_cycles: 0xffff, max_idle_ns: 537538477 ns timer #0 at f080a000, irq=17 Console: colour dummy device 80x30 Calibrating delay loop (skipped), value calculated using timer frequency.. 666.66 BogoMIPS (lpj=3333333) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 2048 (order: 1, 8192 bytes) Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes) CPU: Testing write buffer coherency: ok CPU0: thread -1, cpu 0, socket 0, mpidr 80000000 Setting up static identity map for 0x100000 - 0x100058 CPU1: thread -1, cpu 1, socket 0, mpidr 80000001 Brought up 2 CPUs SMP: Total of 2 processors activated (1333.33 BogoMIPS). CPU: All CPU(s) started in SVC mode. devtmpfs: initialized VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 4 clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns pinctrl core: initialized pinctrl subsystem NET: Registered protocol family 16 DMA: preallocated 256 KiB pool for atomic coherent allocations cpuidle: using governor menu hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers. hw-breakpoint: maximum watchpoint size is 4 bytes. zynq-ocm f800c000.ocmc: ZYNQ OCM pool: 256 KiB @ 0xf0880000 zynq-pinctrl 700.pinctrl: zynq pinctrl initialized vgaarb: loaded SCSI subsystem initialized usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb media: Linux media interface: v0.10 Linux video capture interface: v2.00 pps_core: LinuxPPS API ver. 1 registered pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti giometti@linux.it PTP clock support registered EDAC MC: Ver: 3.0.0 FPGA manager framework fpga-region fpga-full: FPGA Region probed Advanced Linux Sound Architecture Driver Initialized. clocksource: Switched to clocksource arm_global_timer NET: Registered protocol family 2 TCP established hash table entries: 8192 (order: 3, 32768 bytes) TCP bind hash table entries: 8192 (order: 4, 65536 bytes) TCP: Hash tables configured (established 8192 bind 8192) UDP hash table entries: 512 (order: 2, 16384 bytes) UDP-Lite hash table entries: 512 (order: 2, 16384 bytes) NET: Registered protocol family 1 RPC: Registered named UNIX socket transport module. RPC: Registered udp transport module. RPC: Registered tcp transport module. RPC: Registered tcp NFSv4.1 backchannel transport module. PCI: CLS 0 bytes, default 64 Trying to unpack rootfs image as initramfs... rootfs image is not initramfs (no cpio magic); looks like an initrd Freeing initrd memory: 5804K (dfa55000 - e0000000) hw perfevents: enabled with armv7_cortex_a9 PMU driver, 7 counters available futex hash table entries: 512 (order: 3, 32768 bytes) workingset: timestamp_bits=30 max_order=18 bucket_order=0 jffs2: version 2.2. (NAND) (SUMMARY) © 2001-2006 Red Hat, Inc. bounce: pool size: 64 pages io scheduler noop registered io scheduler deadline registered io scheduler cfq registered (default) dma-pl330 f8003000.dmac: Loaded driver for PL330 DMAC-241330 dma-pl330 f8003000.dmac: DBUFF-128x8bytes Num_Chans-8 Num_Peri-4 Num_Events-16 xilinx-vdma 43000000.dma: Xilinx AXI VDMA Engine Driver Probed!! e0001000.serial: ttyPS0 at MMIO 0xe0001000 (irq = 27, base_baud = 6249999) is a xuartps console [ttyPS0] enabled [drm] Initialized brd: module loaded loop: module loaded libphy: Fixed MDIO Bus: probed CAN device driver interface libphy: MACB_mii_bus: probed macb e000b000.ethernet eth0: Cadence GEM rev 0x00020118 at 0xe000b000 irq 28 (00:0a:35:00:01:22) Generic PHY e000b000.etherne:03: attached PHY driver [Generic PHY] (mii_bus:phy_addr=e000b000.etherne:03, irq=-1) e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k e1000e: Copyright(c) 1999 - 2015 Intel Corporation. ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver ehci-pci: EHCI PCI platform driver usbcore: registered new interface driver usb-storage mousedev: PS/2 mouse device common for all mice i2c /dev entries driver cdns-i2c e0005000.i2c: 400 kHz mmio e0005000 irq 24 cdns-wdt f8005000.watchdog: Xilinx Watchdog Timer at f099a000 with timeout 10s EDAC MC: ECC not enabled Xilinx Zynq CpuIdle Driver started sdhci: Secure Digital Host Controller Interface driver sdhci: Copyright(c) Pierre Ossman sdhci-pltfm: SDHCI platform and OF driver helper mmc0: SDHCI controller on e0100000.sdhci [e0100000.sdhci] using ADMA ledtrig-cpu: registered to indicate activity on CPUs usbcore: registered new interface driver usbhid usbhid: USB HID core driver fpga_manager fpga0: Xilinx Zynq FPGA Manager registered NET: Registered protocol family 10 sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver NET: Registered protocol family 17 can: controller area network core (rev 20120528 abi 9) NET: Registered protocol family 29 can: raw protocol (rev 20120528) can: broadcast manager protocol (rev 20161123 t) can: netlink gateway (rev 20130117) max_hops=1 Registering SWP/SWPB emulation handler mmc0: new high speed SDHC card at address 1234 hctosys: unable to open rtc device (rtc0) mmcblk0: mmc0:1234 SA16G 14.6 GiB mmcblk0: p1 of_cfs_init of_cfs_init: OK ALSA device list: No soundcards found. RAMDISK: gzip image found at block 0 EXT4-fs (ram0): couldn't mount as ext3 due to feature incompatibilities EXT4-fs (ram0): mounted filesystem without journal. Opts: (null) VFS: Mounted root (ext4 filesystem) on device 1:0. devtmpfs: mounted Freeing unused kernel memory: 1024K (c0900000 - c0a00000) FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. random: sshd: uninitialized urandom read (32 bytes read) export_store: invalid GPIO 110 axidma: loading out-of-tree module taints kernel. axidma: axidma_dma.c: axidma_dma_init: 705: DMA: Found 0 transmit channels and 0 receive channels. axidma: axidma_dma.c: axidma_dma_init: 707: VDMA: Found 1 transmit channels and 1 receive channels. axidma: axidma_dma.c: axidma_start_transfer: 298: VDMA receive transaction timed out. Z-turn# ` I am working on using vdma to capture hdmi siginal. but it never worked with linux . did your driver or library provid vdma interface ? thank you for reply.

yaobaishen commented 6 years ago

Sorry that I didn't work out VDMA either, so I am looking forward someone to verify it too.

tangxu00 commented 6 years ago

I will try working on it , yaobaishen reply me ,and he give up . thanks for you great work !

tangxu00 commented 6 years ago

hei , I changed my hardwere desigin in vivado , maybe the vdma worked ?

Z-turn# insmod axidma.ko axidma: loading out-of-tree module taints kernel. axidma: axidma_dma.c: axidma_dma_init: 705: DMA: Found 0 transmit channels and 0 receive channels. axidma: axidma_dma.c: axidma_dma_init: 707: VDMA: Found 1 transmit channels and 1 receive channels. Z-turn# ./axidma_benchmark -v AXI DMA Benchmark Parameters: Transmit Buffer Size: 7.91 Mb Receive Buffer Size: 7.91 Mb Number of DMA Transfers: 1000 transfers

Using transmit channel 0 and receive channel 1. Single transfer test successfully completed! Beginning performance analysis of the DMA engine.

random: fast init done

DMA Timing Statistics: Elapsed Time: 27.71 s Transmit Throughput: 285.42 Mb/s Receive Throughput: 285.42 Mb/s Total Throughput: 570.83 Mb/s Z-turn#

Why take it so long ?

bperez77 commented 6 years ago

Interesting, well you're the first person to have VDMA working, so that's good. There must have been some issue with my design when I was testing.

Hmm, the output of the timing statistics doesn't make much sense. According to the bandwidth numbers, your transfer should be completing in far less time, but it's not. So, for some reason the numbers are are inconsistent, which I haven't seen before.

I do notice a line of output random: fast init done, which may be causing some delay? Not sure if that is the source are not. Do you get the same results after running the benchmark several times in a row?

tangxu00 commented 6 years ago

my bandwidth numbers is 24 , for RGB . I try several times ,"random: fast init done" appeard in the first time , I dont know what it mean.The time is always 27.71s. I just want to use VDMA rx channel to rx a frame picture of HDMI video which has been transfer into 24 bit RGB format, and storage it into a DDR ,how can i use the vdma driver? Can you give me some guidance?thank you!

bperez77 commented 6 years ago

I see, so the time is consistent. That's very odd though, because obviously 7.91 Mb / 27.71 s obviously does not come out to 285.42 Mb/s. The 285.42 Mb/s throughput number is about what I would expect, but the elapsed time is way too long. You haven't made any changes to the axidma benchmark code, correct?

Sure thing, that's pretty straightforward. So you can just use axidma_malloc to allocate your single frame buffer. Then, you can use axidma_video_transfer to setup a loop transfers, where the buffer is continuously streamed out from DRAM to your IP. Then, it's up to your application code to update the buffer as need.

Alternately, if you're planning on using Xilinx's DRM driver, you can use axidma_register_buffer to share the DRM driver's DMA buffer with my driver. Naturally, you need to get a handle to the DRM driver's buffer through libdrm first.

bperez77 commented 6 years ago

Oh one other thing I noticed is the following line:

axidma: loading out-of-tree module taints kernel.

This indicates that the kernel you're running on your board, and the one that you built the driver against do not match. This isn't the cause of the timing issue, but it can cause the driver to crash, so I'd recomment making sure you're using the kernel you built the driver against.

tangxu00 commented 6 years ago

I have seen the banchmark.c ,in " DMA Timing Statistics " function , Number of DMA Transfers is 1000 ,so 7.91Mb/27.71s *1000=285.42Mb/s. Inorder to use your driver , I bulild linux-4.9 to my board ,I can sure that I use the it to bulid the driver . this problem maybe related to my config for kernel .

bperez77 commented 6 years ago

Sorry had a bit of a brain fart there, you're right about the time reported by the benchmark.

What throughput are you expecting for the transfers? I know that my driver does introduce a bit of an overhead, but you should still get near the maximum performance. Looking at table 2-3 of the AXI VDMA User Guide, I think it's near the expected. Naturally, the exact throughput depends on data width configured in your AXI VDMA IP block.

tangxu00 commented 6 years ago

I see "axidma_video_transfer" function just use tx channel for display , on the contrary , I need datas streamed out from IP to DRAM buffer , not a loop , but a oneway road from external video signal to DRAM. I have not seen any “video_read” founction , maybe you can add AXIDMA_DMA_VIDEO_READ in ioctl when you have time , just a suggestion .

bperez77 commented 6 years ago

Ahh I see. Actually, if you're just doing transfer one at a time, you can utilize axidma_oneway_transfer or axidma_twoway_transfer, with a VDMA channel. Of course, this will be slower, which I'm guessing is what you were referring to in your question as to why it's so slow.

Otherwise, I can send you a patch for continuous loop video read, analgous to axidma_video-write. Unfortunately, I'm busy the next few weeks, so I won't be able to test it fully, but I can still send you the patch.

tangxu00 commented 6 years ago

Thank you so much , that's will be great helpful !

bperez77 commented 6 years ago

Ok, I added support with c3181d8.

This code is currently untested, I only checked that it compiled against the most recent version of Xilinx's kernel. Unfortunatley, I don't have a device to test with, so could you run and test the code, and let me know if you run into any issues.

tangxu00 commented 6 years ago

vdma_test.txt I write a simple app use vmda loop to transfer files , but i failed . that is dmsg: axidma: axidma_dma.c: axidma_dma_init: 706: DMA: Found 0 transmit channels and 0 receive channels. axidma: axidma_dma.c: axidma_dma_init: 708: VDMA: Found 1 transmit channels and 1 receive channels. Unhandled fault: page domain fault (0x01b) at 0xbeea3cb0 pgd = ef318000 [beea3cb0] *pgd=3de24831 Internal error: Oops - BUG: 1b [#1] PREEMPT SMP ARM Modules linked in: axidma(O) CPU: 0 PID: 720 Comm: vdma_test Tainted: G O 4.9.0-xilinx #2 Hardware name: Xilinx Zynq Platform task: ef1b8440 task.stack: ef174000 PC is at axidma_video_transfer+0xd4/0x194 [axidma] LR is at 0x500 pc : [] lr : [<00000500>] psr: 80000013 sp : ef175df8 ip : beea3cb0 fp : beea3c34 r10: 00000000 r9 : ef174000 r8 : 00000005 r7 : 002a3000 r6 : ef12ad80 r5 : 00000000 r4 : ef175e7c r3 : 000002d0 r2 : 00000000 r1 : ef25aa80 r0 : ef12ad80 Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 18c5387d Table: 2f31804a DAC: 00000051 Process vdma_test (pid: 720, stack limit = 0xef174210) Stack: (0xef175df8 to 0xef176000) 5de0: 002a3000 60000013 5e00: ef31ada8 00000001 ef25aa80 00000000 00000000 00000000 00000000 00000000 5e20: 00000000 00000001 00000001 00000000 00000022 ef1b8440 00000000 00000500 5e40: 00000003 000002d0 00000005 beea3c14 ef12ad80 00000051 80185707 bf000dc0 5e60: 00000707 c0a08648 ef3c0200 00000000 00000000 ef25a780 ee9714d0 00000000 5e80: 00000001 beea3cb0 00000500 00000003 000002d0 c01b7998 ef3b21c4 ee978c40 5ea0: ee9714d0 c01b8b28 ee9714d0 00000707 040444fb c01b8bdc ee9714d0 040444fb 5ec0: ef0c6780 c01ba354 ed85e748 00000000 ef0c6780 00000000 00000000 002a3000 5ee0: 00002000 ef0c6780 beea3c14 ef3b20f0 80185707 c01ded20 40049409 c01df680 5f00: ef0c6780 00000001 002a3000 00000003 00000000 ee978c40 000002a3 c01ba754 5f20: 00000000 00000000 00000000 b68d9000 00000001 ee978c78 ef0c6780 00000003 5f40: 002a3000 00000000 ef174000 00000000 beea3c34 c01aa3e8 00000001 00000000 5f60: 00000000 ef175f6c 00000005 ef0c6780 ef0c6780 beea3c14 80185707 00000005 5f80: ef174000 00000000 beea3c34 c01df70c 00008964 00000000 000000d8 00000036 5fa0: c0106f64 c0106da0 00008964 00000000 00000005 80185707 beea3c14 beea3c14 5fc0: 00008964 00000000 000000d8 00000036 b6f65000 00000000 b6fc3000 beea3c34 5fe0: 00000000 beea3c00 b6f91cac b6ef1c1c 60000010 00000005 00000000 00000000 [] (axidma_video_transfer [axidma]) from [] (axidma_ioctl+0x804/0x9a0 [axidma]) [] (axidma_ioctl [axidma]) from [] (vfs_ioctl+0x18/0x34) [] (vfs_ioctl) from [] (do_vfs_ioctl+0x838/0x88c) [] (do_vfs_ioctl) from [] (SyS_ioctl+0x38/0x54) [] (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x3c) Code: e1a00006 e58d7000 e1a02005 e59d1010 (e79c3105) ---[ end trace 94a751a56bb75147 ]---

I guess something wrong in memory of my app, segmentation fault. But i also test your axidma_display_image.c , same question.

bperez77 commented 6 years ago

Yeah, even if there's an issue with your app's memory, it shouldn't cause a segfault in the kernel. I've pushed something new, so can you pull and try again.

Also, can you post the output as a text file? It looks the formatting of it got messed up a bit.

tangxu00 commented 6 years ago

hei , I tried again . I changed your axidma_transfer.c , use axidma_get_vdma_tx(axidma_dev) function to change into vdma channel , It worked well to transfer file . post file axidma_transfer.txt and axidma_transfer-dmsg.txt . so i changed axidma_twoway_transfer into axidma_oneway_transfer ,the same issue comes . post file vdma.txt and vdma-dmsg.txt. and axidma_display.c also failed. I used axidma_oneway_transfer to tx , and then axidma_oneway_transfer to rx ,It should be equal to axidma_twoway_transfer , but it failed , axidma_twoway_transfer worked . maybe something wrong in one channel config. axidma_transfer.txt axidma_transfer-dmsg.txt

vdma.txt vdma-dmsg.txt

axidma_display_image.txt axidma_display_image-dmsg.txt

bperez77 commented 6 years ago

Huh, that's really odd. I don't think I've seen any error like that before. The segfault from your logs seems to be pointing to the crash happening in the driver's video transfer function. However, even after combing over the funcntion, I can't find any obvious place where it would segfault. It's eve more odd that the two way transfer works perfectly fine for you.

Just to confirm, you are on the latest version of the driver?

dlaurentiu commented 6 years ago

Hi Brandon,

I've just tried to use example axidma_display_image with a VDMA core and I get the same crash. Running the latest xilinx-linux on a ZC702 board.

crash.txt

(PS. Thank you for this project; it's a lifesaver.)

(Later: After some testing I've noticed that the error comes from accessing axidma_video_transaction.frame_buffer[0], which should point to the userspace address of the buffer. This is where the kernel faults; it seems that the specified address should be in kernel space. I've made a quick test and change void *frame_buffers, to void frame_buffers, given that there is only one fb address and it runs without the crash).

bperez77 commented 6 years ago

Oh got it, I think I might know what the issue is. I made a pretty simple mistake. In the IOCTL for AXIDMA_VIDEO_WRITE (and incidentally in the read IOCTL as well), I'm not copying the array of frame buffers from user space to kernel space, which is a big no-no. What is likely happening is that memory happens to be paged out, which leads to the segfault. The reason changing to void *frame_buffers works is because it's handled by the first copy_from_user call in that IOCTL.

So, I need a second call to copy_from_user to copy the array of frame buffers. Let me make that change and then push it.

tangxu00 commented 6 years ago

sorry for later ,after long time debug , I cant get a complete frame picture use your vdma driver , so I begin my own driver use ioremap register of vdma ,and it easy to work . but it is not normative to linux kernel ,so I still want to try your driver , have you update the code ? thank you .

bperez77 commented 6 years ago

Not yet, but I was planning on getting it up this weekend. I'll update this issue once it's up.

elektrokokke commented 6 years ago

I can confirm that this issue is caused by not calling copy_from_user on the frame buffer array.

My dirty fix for this was:

--- a/driver/axidma_chrdev.c
+++ b/driver/axidma_chrdev.c
@@ -345,6 +345,7 @@
     struct axidma_inout_transaction inout_trans;
     struct axidma_video_transaction video_trans;
     struct axidma_chan chan_info;
+    void *framebuffers[32];

     // Coerce the arguement as a userspace pointer
     arg_ptr = (void __user *)arg;
@@ -452,16 +453,13 @@
                            "AXIDMA_DMA_VIDEO_READ.\n");
                 return -EFAULT;
             }
-
-            // Verify that we can access the array of frame buffers
-            size = video_trans.num_frame_buffers *
-                   sizeof(video_trans.frame_buffers[0]);
-            if (!axidma_access_ok(video_trans.frame_buffers, size, true)) {
-                axidma_err("Unable to copy frame buffer addresses from "
-                           "userspace for AXIDMA_DMA_VIDEO_WRITE.\n");
-                return -EFAULT;
-            }
-
+            if (copy_from_user(framebuffers, video_trans.frame_buffers,
+                   video_trans.num_frame_buffers * sizeof(void*)) != 0) {
+               axidma_err("Unable to copy framebuffer pointers from userspace for "
+                          "AXIDMA_DMA_VIDEO_READ.\n");
+               return -EFAULT;
+           }
+            video_trans.frame_buffers = framebuffers;
             rc = axidma_video_transfer(dev, &video_trans, AXIDMA_READ);
             break;

There is a lot of room for improvement on this, however...

Cheers

P.S.: Thx for the nice piece of work. For simple interfacing with VDMA cores I however begin to think that the Xilinx driver makes it much much to complicated compared to the bare-metal way...

ImagotechGmbH commented 6 years ago

Hello Brandon, first of all I would like to send you a huge THANK YOU for your great work! I tried to get VDMA working for - believe it or not - months (!) now and I finally succeeded using your code, what a help! I managed to get VDMA working on a Zynq Zybo 7010 board, running kernel xilinx 4.4.30 and ubuntu 16.04. I modified the "image" size of axidma_benchmark a little and got the following data rates, I'm not yet sure if this is a good and plausible value for a PL clock of 150MHz.

./axidma_benchmark -v

AXI DMA Benchmark Parameters: Transmit Buffer Size: 1.17 Mb Receive Buffer Size: 1.17 Mb Number of DMA Transfers: 1000 transfers

Using transmit channel 0 and receive channel 1. Step #1 Single transfer test successfully completed! Beginning performance analysis of the DMA engine.

DMA Timing Statistics: Elapsed Time: 35.04 s Transmit Throughput: 33.44 Mb/s Receive Throughput: 33.44 Mb/s Total Throughput: 66.88 Mb/s

Next step will be to use a custom IP as axi stream source, I will also give the axidma_transfer a try.

Please keep developing great software, your work is highly appreciated! All the best from Munich, home of Oktoberfest ;-) J.

bperez77 commented 6 years ago

@tangxu00 and @elektrokokke the most recent commit should resolve this issue. Let me know if you guys encounter any additional issues.

bperez77 commented 6 years ago

@juergenmuc thanks, I appreciate it! Those numbers seem reasonable, though it's hard to say at a high-level glance. The smaller your transfer size, the less throughput you will see from the driver. This is because the system calls to initiate transfers have a relatively high overheard, so the larger the transfer, the more this cost will be amortized.