Xilinx / dma_ip_drivers

Xilinx QDMA IP Drivers
https://xilinx.github.io/dma_ip_drivers/
581 stars 421 forks source link

insmod xdma.ko crash on the first ioread32() after pci_iomap() #232

Open mazhenke opened 1 year ago

mazhenke commented 1 year ago

I got a blocking issue when running XDMA driver on my ARM platform: N1SDP.

On the U280, I generated one FPGA project with XDMA: image image image image image image

And for the XDMA driver, I build into a ko without specify config_bar_num/xvc_bar_num/xvc_bar_offset, and the XDMA driver will call: map_single_bar() --> is_config_bar(), and I got a kernal ops at the first read_register in is_config_bar: irq_id = read_register(&irq_regs->identifier);

The following is my log:

root@n1sdp:~/work# root@n1sdp:~/work# root@n1sdp:~/work# echo 8 > /proc/sys/kernel/printk root@n1sdp:~/work# root@n1sdp:~/work# cat /proc/sys/kernel/printk 8 4 1 7 root@n1sdp:~/work# root@n1sdp:~/work# insmod xdma_log_io.ko [ 36.409496] xdma: loading out-of-tree module taints kernel. [ 36.415324] xdma: module verification failed: signature and/or required key missing - tainting kernel [ 36.425053] xdma:xdma_mod_init: Xilinx XDMA Reference Driver xdma v2020.2.2 [ 36.432004] xdma:xdma_mod_init: desc_blen_max: 0xfffffff/268435455, timeout: h2c 10 c2h 10 sec. [ 36.441118] xdma 0001:01:00.0: Adding to iommu group 1 [ 36.446293] xdma:xdma_device_open: xdma device 0001:01:00.0, 0x0000000032c59170. [ 36.453689] xdma 0001:01:00.0: enabling device (0000 -> 0002) [ 36.459473] xdma:map_single_bar: BAR0 at 0x69200000 mapped at 0xffff80000ba00000, length=65536(/65536) [ 36.468769] xdma:read_register: read reg: 0xffff80000ba02000 [ 36.474420] SError Interrupt on CPU3, code 0x00000000be000411 -- SError [ 36.474423] CPU: 3 PID: 453 Comm: insmod Tainted: G OE 6.3.3+ #1 [ 36.474425] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 36.474428] pc : xdma_device_open+0xe98/0x1498 [xdma] [ 36.474439] lr : xdma_device_open+0xe90/0x1498 [xdma] [ 36.474446] sp : ffff80000b9e3590 [ 36.474447] x29: ffff80000b9e3590 x28: 0000000000000000 x27: ffff0080009733d0 [ 36.474450] x26: ffff80000165bb80 x25: ffff80000ba00000 x24: 0000000000000000 [ 36.474452] x23: 0000000000000000 x22: ffff008000973000 x21: ffff80000165a350 [ 36.474454] x20: ffff008010acb048 x19: ffff80000ba02000 x18: 0000000000000000 [ 36.474456] x17: 3061623030303038 x16: 6666666678302074 x15: 612064657070616d [ 36.474458] x14: 0000000000000000 x13: 3030303230616230 x12: 3030303866666666 [ 36.474460] x11: 7830203a67657220 x10: 64616572203a7265 x9 : ffff8000083efa8c [ 36.474462] x8 : 64616572203a7265 x7 : 205d393637383634 x6 : ffff80000a82d550 [ 36.474464] x5 : 0000000000000000 x4 : ffff00837dfbad08 x3 : ffff800001652470 [ 36.474466] x2 : ffff80000164a9c8 x1 : ffff80000ba02000 x0 : 0000000000000020 [ 36.474468] Kernel panic - not syncing: Asynchronous SError Interrupt [ 36.474470] CPU: 3 PID: 453 Comm: insmod Tainted: G OE 6.3.3+ #1 [ 36.474472] Call trace: [ 36.474473] dump_backtrace+0xac/0x138 [ 36.474477] show_stack+0x20/0x38 [ 36.474478] dump_stack_lvl+0x78/0xc8 [ 36.474482] dump_stack+0x18/0x28 [ 36.474484] panic+0x3d0/0x428 [ 36.474488] nmi_panic+0xb4/0xc0 [ 36.474490] arm64_serror_panic+0x78/0x90 [ 36.474491] do_serror+0x60/0x68 [ 36.474493] el1h_64_error_handler+0x3c/0x70 [ 36.474497] el1h_64_error+0x7c/0x80 [ 36.474499] xdma_device_open+0xe98/0x1498 [xdma] [ 36.474506] probe_one+0x98/0x2b0 [xdma] [ 36.474514] local_pci_probe+0x48/0xd0 [ 36.474518] pci_device_probe+0xb4/0x240 [ 36.474520] really_probe+0x198/0x400 [ 36.474525] driver_probe_device+0x90/0x1b0 [ 36.474527] driver_probe_device+0x44/0x168 [ 36.474530] driver_attach+0x104/0x250 [ 36.474532] bus_for_each_dev+0x7c/0xe8 [ 36.474536] driver_attach+0x2c/0x40 [ 36.474538] bus_add_driver+0x118/0x250 [ 36.474540] driver_register+0x68/0x138 [ 36.474542] pci_register_driver+0x4c/0x60 [ 36.474545] xdma_mod_init+0x9c/0xb8 [xdma] [ 36.474552] do_one_initcall+0x4c/0x2e0 [ 36.474554] do_init_module+0x50/0x210 [ 36.474558] load_module+0x21f4/0x24a8 [ 36.474560] __do_sys_finit_module+0xc4/0x148 [ 36.474562] arm64_sys_finit_module+0x28/0x40 [ 36.474564] invoke_syscall+0x78/0x108 [ 36.474566] el0_svc_common.constprop.0+0x58/0x188 [ 36.474567] do_el0_svc+0x40/0xb8 [ 36.474568] el0_svc+0x34/0x138 [ 36.474571] el0t_64_sync_handler+0xb8/0xc0 [ 36.474573] el0t_64_sync+0x1a8/0x1b0 [ 36.474575] SMP: stopping secondary CPUs [ 36.474579] Kernel Offset: 0x180000 from 0xffff800008000000 [ 36.474580] PHYS_OFFSET: 0x80000000 [ 36.474581] CPU features: 0x000000,20200506,3201720b [ 36.474582] Memory Limit: none [ 37.516082] pstore: backend (efi_pstore) writing error (-5) [ 37.797990] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---

mazhenke commented 1 year ago

I add some log in map_single_bar() and read_register():

static int map_single_bar(struct xdma_dev xdev, struct pci_dev dev, int idx) { ... ... pr_info("BAR%d at 0x%llx mapped at 0x%llx, length=%llu(/%llu)\n", idx, (u64)bar_start, (u64)(xdev->bar[idx]), (u64)map_len, (u64)bar_len); ... ... }

inline u32 read_register(void *iomem) { pr_info("read reg: 0x%llx\n", (u64)iomem);

return ioread32(iomem);

}

rcls commented 1 month ago

Did you ever resolve this?

I had a similar kernel panic. For me the solution was to set the config_bar_number in the makefile. Without it, the driver attempts to blindly probe the bars to guess which one to use. That can hit an invalid address and panic.