Xilinx / dma_ip_drivers

Xilinx QDMA IP Drivers
https://xilinx.github.io/dma_ip_drivers/
560 stars 415 forks source link

How to cancel aio read when using XDMA with poll_mode #181

Open xueweiwujxw opened 1 year ago

xueweiwujxw commented 1 year ago

Expected Behaviour

The aio_read request can be cancelled by aio_cancel when there is no data from FPGA

Actual Behaviour

It takes a long time to stop this transfer. I think it was just timeout and aio_cancel did not work.

Steps to Reproduce

Build the driver from branch 2019.2. I add this commit cf3611e06e6c4cd29214a65d6416e2e27fd988fa.

System Information

Ubuntu 18.04 Kernel 4.18.0-15-generic

Code

unsigned long long proc_ptr = 0;
unsigned long long sch_ptr = 0;
for (int i = 0; i < AIO_BUFFERSIZE; i++) {
    int ret = posix_memalign(&buffer[i].data, getpagesize(), ALLOC_MEM_SIZE);
    if (ret != 0) {
        logf_err("%d %s\n", i, strerror(ret));
        return;
    }
    buffer[i].has_processed = false;
    buffer[i].acb.aio_buf = buffer[i].data;
    buffer[i].acb.aio_fildes = rx_fd;
    buffer[i].acb.aio_offset = 0;
    buffer[i].acb.aio_nbytes = ALLOC_MEM_SIZE;
}
// ...
while (param->running) {
    while (sch_ptr - proc_ptr < AIO_BUFFERSIZE) {
        aio_read(&buffer[sch_ptr % AIO_BUFFERSIZE].acb);
        sch_ptr++;
    }

    while (proc_ptr < sch_ptr) {
        int rv = aio_error(&buffer[proc_ptr % AIO_BUFFERSIZE].acb);
        if (rv == EINPROGRESS)
            continue;
        if (rv == 0) {
            int len = aio_return(&buffer[proc_ptr % AIO_BUFFERSIZE].acb);
            if (len > 0) {
                if (len != last_length) {
                    logf_warn("xdma channel %d read data %d != last read len %d\n", order, len, last_length);
                    last_length = len;
                }
                rcvd_bytes = rcvd_bytes + len;
            }
        }
        proc_ptr++;
    }
}
//...
aio_cancel(this->rx_fd, nullptr);
int countNum = 0;
while (proc_ptr < sch_ptr) {
    int rv = aio_error(&buffer[proc_ptr % AIO_BUFFERSIZE].acb);
    if (rv == EINPROGRESS) {
        countNum++;
        continue;
    }
    if (rv == 0) {
        int len = aio_return(&buffer[proc_ptr % AIO_BUFFERSIZE].acb);
        if (len > 0)
            rcvd_bytes = rcvd_bytes + len;
        if (aio_cancel(rx_fd, &buffer[proc_ptr % AIO_BUFFERSIZE].acb) == AIO_NOTCANCELED)
            aio_error(&buffer[proc_ptr % AIO_BUFFERSIZE].acb);
    }
    proc_ptr++;
}
logf_debug("xdma channel %d shutdown after wait %d\n", order, countNum);
int resCancel = aio_cancel(rx_fd, nullptr);
if (resCancel == AIO_ALLDONE) {
} else if (resCancel == AIO_CANCELED)
    logf_info("AIO_CANCELED\n");
else if (resCancel == AIO_NOTCANCELED) {
    logf_warn("AIO_NOTCANCELED\n");
} else if (resCancel == -1)
    logf_err("AIO_CANCEL fail: %s\n", strerror(errno));
for (int i = 0; i < AIO_BUFFERSIZE; i++)
    free(buffer[i].data);

aio_error always return EINPROGRESS after I ran aio_cancel(this->rx_fd, nullptr), so I have to wait. However, it worked as expected with interrupt mode.

xueweiwujxw commented 1 year ago

I got some kernel error when I use the driver from branch 2019.2.

Error Message

#### kernel log ```log [ 3341.456162] xdma:engine_service_final_transfer: 0-C2H0-ST xfer empty, pdesc completed 4294959104. [ 3341.456163] xdma:engine_service_resume: no queued transfers but 0-C2H0-ST engine running! [ 3341.456178] WARNING: CPU: 8 PID: 18237 at /home/uhdemod/Downloads/dma_ip_drivers/XDMA/linux-kernel/xdma/libxdma.c:1029 engine_service+0x4cb/0x520 [xdma] [ 3341.456179] Modules linked in: xdma(OE) ipmi_ssif nls_iso8859_1 intel_rapl skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_cstate intel_rapl_perf joydev input_leds hid_multitouch lpc_ich mei_me mei ioatdma ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid ast ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops igb nvme drm ahci nvme_core dca i2c_algo_bit libahci wmi [last unloaded: xdma] [ 3341.456208] CPU: 8 PID: 18237 Comm: demoder Tainted: G W OE 4.18.0-15-generic #16~18.04.1-Ubuntu [ 3341.456208] Hardware name: Supermicro Super Server/X11SPL-F, BIOS 3.5 05/19/2021 [ 3341.456210] RIP: 0010:engine_service+0x4cb/0x520 [xdma] [ 3341.456211] Code: 6a c0 48 c7 c7 50 c5 6a c0 e8 44 56 25 c7 e9 3e fe ff ff 48 8d 53 10 48 c7 c6 d0 a4 6a c0 48 c7 c7 38 c7 6a c0 e8 28 56 25 c7 <0f> 0b e9 1d fe ff ff 48 8b 43 28 48 8d 78 40 e8 61 11 63 c7 a8 01 [ 3341.456228] RSP: 0018:ffffb9a7480b3cc8 EFLAGS: 00010082 [ 3341.456229] RAX: 000000000000004d RBX: ffff9ae18fb88a18 RCX: 0000000000000006 [ 3341.456230] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff9ae1bf8164b0 [ 3341.456231] RBP: ffffb9a7480b3cf0 R08: 00000000000006aa R09: 0000000000000004 [ 3341.456231] R10: ffffb9a7480b3bd8 R11: 0000000000000001 R12: 0000000000000001 [ 3341.456232] R13: ffff9ae18fb88a90 R14: 00000000ffffe000 R15: dead000000000100 [ 3341.456233] FS: 00007fc7fec3cb80(0000) GS:ffff9ae1bf800000(0000) knlGS:0000000000000000 [ 3341.456233] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3341.456234] CR2: 00007fc7f0000010 CR3: 00000010119e8004 CR4: 00000000007606e0 [ 3341.456235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3341.456236] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3341.456236] PKRU: 55555554 [ 3341.456236] Call Trace: [ 3341.456240] engine_service_poll+0xf1/0x160 [xdma] [ 3341.456242] xdma_xfer_submit+0x3c8/0x890 [xdma] [ 3341.456244] ? char_sgdma_map_user_buf_to_sgl+0x12a/0x260 [xdma] [ 3341.456246] char_sgdma_read_write+0x113/0x1c0 [xdma] [ 3341.456248] char_sgdma_read+0x11/0x20 [xdma] [ 3341.456252] __vfs_read+0x1b/0x40 [ 3341.456253] vfs_read+0x8e/0x130 [ 3341.456254] ksys_pread64+0x76/0x90 [ 3341.456255] __x64_sys_pread64+0x1e/0x20 [ 3341.456259] do_syscall_64+0x5a/0x120 [ 3341.456262] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 3341.456263] RIP: 0033:0x7fc7fda64747 [ 3341.456263] Code: 89 cd 55 53 49 89 d4 48 89 f5 89 fb 48 83 ec 18 e8 3e 1e 02 00 4d 89 ea 41 89 c0 4c 89 e2 48 89 ee 89 df b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 74 1e 02 00 48 [ 3341.456281] RSP: 002b:00007fc7fec3be30 EFLAGS: 00000293 ORIG_RAX: 0000000000000011 [ 3341.456282] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fc7fda64747 [ 3341.456282] RDX: 0000000000400000 RSI: 00007fc7f8a36000 RDI: 0000000000000008 [ 3341.456283] RBP: 00007fc7f8a36000 R08: 0000000000000000 R09: 00007fc7fec3cb80 [ 3341.456283] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000400000 [ 3341.456284] R13: 0000000000000000 R14: ffffffffffffff50 R15: 00007fc7ec01a110 [ 3341.456285] ---[ end trace abe179ad78e5ed1b ]--- ```

I upgraded the driver to branch 2020.2. It can fix the kernel error. However, the driver still needs to wait for c2h_timeout seconds to stop. And the value that aio_read returned does not match the value that I attended to read when the program had run after a while. The lane width is x8 and the maximum link speed is 5GT/s. The maximum speed of axi stream is 4.8Gbps.