Closed Baum55 closed 6 years ago
Interesting, it's hard to say what the issue would likely be without more details. You're likely seeing a VDMAIntErr
because the amount of data you're receiving does not match the amount you specified (either greater than or less than).
It's odd that doing an FTP or SSH transfer would affect if this issue occurs. The only thing I could think of is if the network controller and the AXI VDMA IP shared a DMA channel, then perhaps there is some data contention? However, I don't think this should be the case because they should be utilizing different DMA channels, and DMA transfers are a two-way handshake, so no data should ever be lost.
Can you give me the following to help you debug this:
dmesg
command (please attach this as a file).axidma_chrdev: axidma_chrdev@0 { compatible = "xlnx,axidma-chrdev"; dmas = <&axi_dma_0 0>; dma-names = "rx_channel"; };
axi_dma_0: dma@40400000 {
clock-names = "s_axi_lite_aclk", "m_axi_sg_aclk", "m_axi_s2mm_aclk";
clocks = <&clkc 15>, <&clkc 15>, <&clkc 15>;
compatible = "xlnx,axi-dma-1.00.a";
interrupt-parent = <&intc>;
interrupts = <0 29 4>;
reg = <0x40400000 0x10000>;
xlnx,addrwidth = <0x20>;
dma-channel@40400030 {
compatible = "xlnx,axi-dma-s2mm-channel";
dma-channels = <0x1>;
interrupts = <0 29 4>;
xlnx,datawidth = <0x200>;
xlnx,device-id = <0x0>;
};
};
Great, thanks for info and waveform. This definitely sounds like a dropped data/packets issue. What I think is happening (keep in mind this is pure speculation) is that there is too much contention on the network side of your transfers. There is some coupling of receiving data from the PL and then sending it over the network. When you're simulatenously doing the FTP or SSH transfer, the transfer over the network must slow down enough that your circular ring buffer overflows. In turn, this causes the FIFO on the PL to overflow, and this eventually bubbles up to the driver as receiving less data than you expected, because of the packet drop.
Since I don't know the specifics of your applications, it's hard to determine if this is the exact cause, but it seems pretty likely to me. The solution for this problem is dependent upon the nature of FTP/SSH transfer. If these transfers are bursty in nature, then increasing the size of the ring buffer should alleviate the backpressure on the PL FIFO.
Otherwise, if this FTP/SSH transfers are consistent and at a steady rate, you'll need to either find a way to increase the transfer rate of the Ethernet device, or reduce the FPS at which the line camera operates.
I don't think this issue is related to the CPU utilization. The relevant parts of the design (at least as described above) are all handled by DMA transfers, so the CPU is only acting as a controller in that context.
Thank you very much for your help. That was very helpfull.
I have a consistent and steady FTP/SSH transfer rate, so that I chose your advice as solution.
Great, glad to hear that helped.
I receive the message "xilinx-vdma 40400000.dma: Channel ef3cb010 has errors 10, cdr 0 tdr 0". I use the Zynq Zybo Z7 board with 2017.4 Linux from Xilinx. I only use one receive channel to read data from the FPGA in a high-priority thread. This thread only reads the data and gives the read pointer in a mutex protected ring buffer (The mutex does not block, even in an unsecure mutex-free version the problem still occurs). I read with a data rate of 11520000 Byte/s. Meanwhile I start a FTP or SSH data transfer. I tried to change the nice level of my application, but even with the highest level -20 the problem still exists. The problem only occurs when I have both high system utilization and network utilization. Normally there is no FTP/SSH data transfer in a production system, but I would like to understand the reason why I get the message while network activity.