Xilinx / dma_ip_drivers

Xilinx QDMA IP Drivers
https://xilinx.github.io/dma_ip_drivers/
526 stars 398 forks source link

The PCIe test exceeded the theoretical bandwidth !? #246

Closed pyt-hnu closed 7 months ago

pyt-hnu commented 7 months ago

When I used the test program in /tools, I found that the test results exceeded the theoretical bandwidth. and my test program is dma_test.sh, which called dma_to_device. (ubuntu 20.04 5.4.0-42-generic) PCIe3.0 line:x16

##here is shell script:dma_test.sh
#!/bin/bash
transferSize=$1
transferCount=5
#channelPairs=1

tool_path=../tools

#Run the PCIe DMA streaming test
echo "Info: Running PCIe DMA streaming test"
echo "      transfer size:  $transferSize"
echo "      transfer count: $transferCount"

 $tool_path/dma_to_device \
 -v \
 -d /dev/xdma0_h2c_0 \
 -a 0xc0000000 \
 -f /home/ta/Workspace/pyc/dma_ip_drivers-master/XDMA/linux-kernel/tests/data/datafile_32M.bin \
 -s $transferSize \
 -c $transferCount \
 -k 1

Below is my test result.


[sudo] ta 的密码: 
Info: Running PCIe DMA streaming test
      transfer size:  16777216
      transfer count: 5
dev /dev/xdma0_h2c_0, addr 0xc0000000, aperture 0x1, size 0x1000000, offset 0x0, count 5
host buffer 0x1001000 = 0x7f530ca6c000
#0: CLOCK_MONOTONIC 103.691874346 sec. write 16777216 bytes
#1: CLOCK_MONOTONIC 102.588063483 sec. write 16777216 bytes
#2: CLOCK_MONOTONIC 98.158665754 sec. write 16777216 bytes
#3: CLOCK_MONOTONIC 99.518475304 sec. write 16777216 bytes
#4: CLOCK_MONOTONIC 100.122645961 sec. write 16777216 bytes
** Avg time device /dev/xdma0_h2c_0, total time 2079724848 nsec, avg_time = 415944960.000000, size = 16777216, BW = 40.335182 
/dev/xdma0_h2c_0 ** Average BW = 16777216, 40.335182
hmaarrfk commented 7 months ago

so what happened?

pyt-hnu commented 7 months ago

Hi @hmaarrfk One problem I found was that in function dma_to_device.c, Whether there is an error in the total_time. when time has units in seconds, he only counts the units in nanoseconds and ignores the units in seconds. Because the seconds and units in timespec are separate.


        rc = clock_gettime(CLOCK_MONOTONIC, &ts_end);

        if (bytes_done < size) {
            printf("#%d: underflow %ld/%ld.\n",
                i, bytes_done, size);
            underflow = 1;
        }
        /* subtract the start time from the end time */
        timespec_sub(&ts_end, &ts_start);
        total_time += ts_end.tv_nsec;
        total_time += ts_end.tv_sec*1000000000; //Here is my extra code
        /* a bit less accurate but side-effects are accounted for */
hmaarrfk commented 7 months ago

great thanks for the explanation

pyt-hnu commented 7 months ago

So am I right?

hmaarrfk commented 7 months ago

maybe. seems correct. I'm not sure. I haven't run this test myself in a while