Xilinx / RecoNIC

RecoNIC is a software/hardware shell used to enable network-attached processing within an RDMA-featured SmartNIC for scale-out computing.
MIT License
105 stars 27 forks source link

Try porting the project to VCU128 #10

Closed niexun725 closed 10 months ago

niexun725 commented 11 months ago

Dear Zhguanw, I am trying to port the project to two Virtex UltraScale + HBM VCU128 FPGA, the system version is Ubuntu 20.04 and linux kernel version 5.4.0-125-generic, and the vivado version is also 2021.2. Everything looks fine until I run the RDMA Read test,Host and Client successfully established a connection, but the Client shows Error : received data mismatched.

I tried to use ila to monitor the AXI4 interface of dev ddr4, and found that the host successfully wrote the data to the DDR4 to be read by the client, and finally the client also read the corresponding amount of data from its own DDR4.However, the client did not actually read the DDR4 of the Host successfully, because I did not see the read operation on the Host DDR4. By the way, the client shows "Successfully send an RDMA read operation"

I guess the reason is the DDR4 capacity of VCU128 DDR4 is too small, only 4GB, even if its chip architecture is almost the same with AU280 , some operations make the memory boundary overflow, I would like to ask if RecoNIC has some limitations on the capacity of DDR?I am learning this project, hoping to get your guidance. Thanks, Xun Nie

zhguanw-amd commented 11 months ago

Hi Xun Nie,

Thanks for your interests and the efforts on porting RecoNIC to VCU128!

From your description, it seems that the client can successfully send RDMA read request packet to the server and the server also can successfully generate RDMA read response packet to the client. The issue happens when the client copies data from the device memory to host memory for verification via the QDMA AXI-MM channel.

Could you reprogram the FPGA, run your read test again and dump RDMA registers? It would be helpful if you can post your dumped RDMA register here after you run read operation. Before you run the read test, could you also share your dmesg information related to onic-driver as well?

I guess the reason is the DDR4 capacity of VCU128 DDR4 is too small, only 4GB, even if its chip architecture is almost the same with AU280 , some operations make the memory boundary overflow, I would like to ask if RecoNIC has some limitations on the capacity of DDR?

I don't think this is related to DDR capacity. RecoNIC defines the device memory size at https://github.com/Xilinx/RecoNIC/blob/main/lib/reconic.h#L47, but this can be configurable, as long as we can instantiate more DDRs in the hardware. Probably you can check whether addresses of the AXI read transaction from host to the device memory after receiving RDMA read response packet at the client side via ILA are correct or not.

Thanks, Guanwen

niexun725 commented 11 months ago

Could you reprogram the FPGA, run your read test again and dump RDMA registers? It would be helpful if you can post your dumped RDMA register here after you run read operation. Before you run the read test, could you also share your dmesg information related to onic-driver as well?

@zhguanw-amd Thanks for your reply! I have reprogrammed the FPGA,and got the following results. As “Info: [RN_RDMA_GCSR_ERRBUFWPTR = 0x6006c] = 0x2” shows, the host received some error packet.

Besides, After use ila, for the box_250mhz module of host, the"a_axis_adap_rx_250mhz"interface, it indeed has some input packets. However, "m_axis_user2rdma_roce_from_cmacrx " interface nerver asserts. This might means that no packets are identified as rdma traffic on the host.I think packet classification doesn't work, but I indeed have a full license of vitis_network_p4.

Here are some of the results displayed after the console runs the dmesg and rdma read instructions on both client and host ( too much content, sorry to take up your time )

After running "dmesg" On the client side:

[   39.512468] onic: loading out-of-tree module taints kernel.
[   39.513196] onic: module verification failed: signature and/or required key missing - tainting kernel
[   39.517258] onic 0000:01:00.0 onic1s0f0 (uninitialized): Set MAC address to 0:a:35:1b:40:13
[   39.517258] onic 0000:01:00.0: device is a master PF
[   39.517260] onic_set_num_queue: num_msix 10, nb_queues 9, pci_msix_user_cnt 1
[   39.517261] onic_pci_probe mm_queues: 4
[   39.517293] onic:qdma_device_open: onic, 01:00.00, pdev 0x00000000258ad560, 0x10ee:0x903f.
[   39.517299] onic 0000:01:00.0: enabling device (0100 -> 0102)
[   39.517402] Device Type: Soft IP
[   39.517403] IP Type: EQDMA Soft IP
[   39.517403] Vivado Release: vivado 2020.2
[   39.517407] onic:qdma_device_attributes_get: qdma01000-p0000:01:00.0: num_pfs:1, num_qs:512, flr_present:0, st_en:1, mm_en:1, mm_cmpt_en:0, mailbox_en:0, mm_channel_max:1, qid2vec_ctx:0, cmpt_ovf_chk_dis:1, mailbox_intr:1, sw_desc_64b:1, cmpt_desc_64b:1, dynamic_bar:1, legacy_intr:1, cmpt_trig_count_timer:1
[   39.517408] onic:qdma_device_open: Vivado version = vivado 2020.2
[   39.517409] qdma_dev_entry_create: Created the dev entry successfully
[   39.520710] onic:xdev_identify_bars: AXI Master Lite BAR 2.
[   39.520712] onic:qdma_device_open: 0000:01:00.0, 01000, pdev 0x00000000258ad560, xdev 0x000000004e84b042, ch 1, q 1024, vf 0.
[   39.622831] onic 0000:01:00.0 enp1s0: renamed from onic1s0f0
[   39.647248] real_num_tx_queues: 68 real_num_rx_queues 68
[   39.647266] onic 0000:01:00.0: onic_cdev_ptr->name = reconic-mm
[   39.647269] onic 0000:01:00.0: successfully cdev_add a character device, reconic-mm, to the system
[   39.647949] onic 0000:01:00.0: successffully device_create a character device, reconic-mm, and register it
[   39.661583] onic 0000:01:00.0 enp1s0: onic_open: device open done

After running "dmesg" On the host side:

[  160.012775] onic: loading out-of-tree module taints kernel.
[  160.012935] onic: module verification failed: signature and/or required key missing - tainting kernel
[  160.015234] onic 0000:04:00.0 onic4s0f0 (uninitialized): Set MAC address to 0:a:35:94:71:bf
[  160.015235] onic 0000:04:00.0: device is a master PF
[  160.015244] onic_set_num_queue: num_msix 10, nb_queues 9, pci_msix_user_cnt 1
[  160.015244] onic_pci_probe mm_queues: 4
[  160.015284] onic:qdma_device_open: onic, 04:00.00, pdev 0x000000001914674f, 0x10ee:0x903f.
[  160.015298] onic 0000:04:00.0: enabling device (0000 -> 0002)
[  160.015500] Device Type: Soft IP
[  160.015501] IP Type: EQDMA Soft IP
[  160.015501] Vivado Release: vivado 2020.2
[  160.015509] onic:qdma_device_attributes_get: qdma04000-p0000:04:00.0: num_pfs:1, num_qs:512, flr_present:0, st_en:1, mm_en:1, mm_cmpt_en:0, mailbox_en:0, mm_channel_max:1, qid2vec_ctx:0, cmpt_ovf_chk_dis:1, mailbox_intr:1, sw_desc_64b:1, cmpt_desc_64b:1, dynamic_bar:1, legacy_intr:1, cmpt_trig_count_timer:1
[  160.015510] onic:qdma_device_open: Vivado version = vivado 2020.2
[  160.015512] qdma_dev_entry_create: Created the dev entry successfully
[  160.022109] onic:xdev_identify_bars: AXI Master Lite BAR 2.
[  160.022112] onic:qdma_device_open: 0000:04:00.0, 04000, pdev 0x000000001914674f, xdev 0x00000000815ae9a2, ch 1, q 1024, vf 0.
[  160.141314] onic 0000:04:00.0 enp4s0: renamed from onic4s0f0
[  160.205177] real_num_tx_queues: 68 real_num_rx_queues 68
[  160.205194] onic 0000:04:00.0: onic_cdev_ptr->name = reconic-mm
[  160.205195] onic 0000:04:00.0: successfully cdev_add a character device, reconic-mm, to the system
[  160.205351] onic 0000:04:00.0: successffully device_create a character device, reconic-mm, and register it
[  160.243862] onic 0000:04:00.0 enp4s0: onic_open: device open done
[  161.149484] IPv6: ADDRCONF(NETDEV_CHANGE): enp4s0: link becomes ready

After run rdma read test on the client:

root@niexun-HP-288-Pro-G6-Microtower-PC:/home/niexun/RecoNIC/examples/rdma_test# sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./read -r 192.100.51.1 -i 192.100.52.1 -p /sys/bus/pci/devices/0000:01:00.0/resource2 -z 128 -l dev_mem -d /dev/reconic-mm -c -u 22222 -t 11111 --dst_qp 2 -g 2>&1 | tee client_debug.log
src_ip_str = 192.100.51.1
dst_ip_str = 192.100.52.1
Info: PCIe resource file: /sys/bus/pci/devices/0000:01:00.0/resource2
Info: QP allocated at: dev_mem
Info: Device - /dev/reconic-mm
Info: src_ip = 192.100.51.1
Info: Found network interface: enp1s0
Info: mac_addr_t = 00:0a:35:1b:40:13
Info: Creating rn_dev
/home/niexun/RecoNIC/lib/reconic.c:296:create_rn_dev(): Info: scr(=4)) file open successfully
create_rn_dev - testing2
/home/niexun/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0xf02c00
/home/niexun/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/home/niexun/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0xf02c00000
Info: pre-allocated hugepage buffer vir addr = 0x7f3166c00000, physical addr = 0xf02c00000
Info: Configuring QDMA AXI bridge BDF
/home/niexun/RecoNIC/lib/reconic.c:194:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16420, bdf_addr_low=0x0
/home/niexun/RecoNIC/lib/reconic.c:195:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16424, bdf_addr_high=0x0
/home/niexun/RecoNIC/lib/reconic.c:196:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16430, bdf_win_config=0xc2000000
Info: CREATE RDMA DEVICE
/home/niexun/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0xf02c00
/home/niexun/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/home/niexun/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0xf02c00000
/home/niexun/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f3166c00000, physical addr = f02c00000, rn_dev->buffer_offset = 0x200000
/home/niexun/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/home/niexun/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0xf03200
/home/niexun/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/home/niexun/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0xf03200000
/home/niexun/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f3166e00000, physical addr = f03200000, rn_dev->buffer_offset = 0x1200000
/home/niexun/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/home/niexun/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0xf04200
/home/niexun/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/home/niexun/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0xf04200000
/home/niexun/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f3167e00000, physical addr = f04200000, rn_dev->buffer_offset = 0x1202000
/home/niexun/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/home/niexun/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0xf04202
/home/niexun/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/home/niexun/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0xf04202000
/home/niexun/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f3167e02000, physical addr = f04202000, rn_dev->buffer_offset = 0x1212000
/home/niexun/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/home/niexun/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0xf04212
/home/niexun/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/home/niexun/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0xf04212000
/home/niexun/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f3167e12000, physical addr = f04212000, rn_dev->buffer_offset = 0x1222000
/home/niexun/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: OPEN RDMA DEVICE
/home/niexun/RecoNIC/lib/rdma_api.c:186:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBA=0x600a0, value=0x3200000
/home/niexun/RecoNIC/lib/rdma_api.c:188:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBAMSB=0x600a4, value=0xf
/home/niexun/RecoNIC/lib/rdma_api.c:190:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFSZ=0x600a8, value=0x10001000
/home/niexun/RecoNIC/lib/rdma_api.c:193:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBA=0x60088, value=0x4200000
/home/niexun/RecoNIC/lib/rdma_api.c:195:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBAMSB=0x6008c, value=0xf
/home/niexun/RecoNIC/lib/rdma_api.c:197:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60090, value=0x2000
/home/niexun/RecoNIC/lib/rdma_api.c:200:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBA=0x60060, value=0x4202000
/home/niexun/RecoNIC/lib/rdma_api.c:202:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBAMSB=0x60064, value=0xf
/home/niexun/RecoNIC/lib/rdma_api.c:204:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60068, value=0x1000100
/home/niexun/RecoNIC/lib/rdma_api.c:207:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBA=0x600b0, value=0x4212000
/home/niexun/RecoNIC/lib/rdma_api.c:209:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBAMSB=0x600b4, value=0xf
/home/niexun/RecoNIC/lib/rdma_api.c:211:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZ=0x600b8, value=0x10000
/home/niexun/RecoNIC/lib/rdma_api.c:213:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZMSB=0x600bc, value=0x0
/home/niexun/RecoNIC/lib/rdma_api.c:217:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_INTEN=0x60180, value=0xff
/home/niexun/RecoNIC/lib/rdma_api.c:221:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDLSB=0x60010, value=0x351b4013
/home/niexun/RecoNIC/lib/rdma_api.c:223:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDMSB=0x60014, value=0xa
/home/niexun/RecoNIC/lib/rdma_api.c:227:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPV4XADD=0x60070, value=0xc0643301
/home/niexun/RecoNIC/lib/rdma_api.c:230:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICCONF=0x60000, value=0x56ce1421
/home/niexun/RecoNIC/lib/rdma_api.c:233:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICADCONF=0x60004, value=0xa0004
Info: RDMA global control status registers are configured.
Info: rdma_dev opened
Info: ALLOCATE PD
/home/niexun/RecoNIC/lib/rdma_api.c:254:allocate_rdma_pd(): [Register] RN_RDMA_PDT_PDPDNUM=0x40000, pd_num=0, value=0x0
Info: OPEN DEVICE FILE
Info: ALLOCATE RDMA QP
Allocating qp->sq
/home/niexun/RecoNIC/lib/rdma_api.c:437:allocate_rdma_qp(): sq_size = 81920, cq_size = 5120, rq_size 655360, buf_location = dev_mem
/home/niexun/RecoNIC/lib/reconic.c:251:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000000000, rn_dev->dev_buffer_offset = 0x14000
/home/niexun/RecoNIC/lib/reconic.c:253:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Allocating qp->cq
/home/niexun/RecoNIC/lib/reconic.c:251:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000014000, rn_dev->dev_buffer_offset = 0x15400
/home/niexun/RecoNIC/lib/reconic.c:253:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Allocating qp->rq
/home/niexun/RecoNIC/lib/reconic.c:251:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000016000, rn_dev->dev_buffer_offset = 0xb6000
/home/niexun/RecoNIC/lib/reconic.c:253:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Info: queue pair setting is done! Configuring RDMA per-queu CSR registers
/home/niexun/RecoNIC/lib/rdma_api.c:485:allocate_rdma_qp(): DEBUG: rdma_dev->rn_dev->axil_ctl = 0x7f3186d86000, rdma_dev->axil_ctl = 0x7f3186d86000
/home/niexun/RecoNIC/lib/rdma_api.c:499:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_IPDESADDR1i=0x60360, qpid=2, value=0xc0643401
/home/niexun/RecoNIC/lib/rdma_api.c:506:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDLSBi=0x60350, qpid=2, value=0x0
/home/niexun/RecoNIC/lib/rdma_api.c:513:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDMSBi=0x60354, qpid=2, value=0x0
/home/niexun/RecoNIC/lib/rdma_api.c:519:allocate_rdma_qp(): DEBUG: win_size_high = 0x1f, win_size_low = 0xffffffff
/home/niexun/RecoNIC/lib/rdma_api.c:536:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAi=0x60310, qpid=2, value=0x0
/home/niexun/RecoNIC/lib/rdma_api.c:543:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAMSBi=0x603c8, qpid=2, value=0xa3500000
/home/niexun/RecoNIC/lib/rdma_api.c:547:allocate_rdma_qp(): DEBUG: qp->sq->dma_addr = 0xa350000000000000, sq_addr_msb = 0xa3500000, sq_addr_lsb = 0x0
/home/niexun/RecoNIC/lib/rdma_api.c:565:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAi=0x60318, qpid=2, value=0x14000
/home/niexun/RecoNIC/lib/rdma_api.c:572:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAMSBi=0x603d0, qpid=2, value=0xa3500000
/home/niexun/RecoNIC/lib/rdma_api.c:576:allocate_rdma_qp(): DEBUG: qp->cq->dma_addr = 0xa350000000014000, cq_addr_msb = 0xa3500000, cq_addr_lsb = 0x14000
/home/niexun/RecoNIC/lib/rdma_api.c:594:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAi=0x60308, qpid=2, value=0x16000
/home/niexun/RecoNIC/lib/rdma_api.c:601:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAMSBi=0x603c0, qpid=2, value=0xa3500000
/home/niexun/RecoNIC/lib/rdma_api.c:605:allocate_rdma_qp(): DEBUG: qp->rq->dma_addr = 0xa350000000016000, rq_addr_msb = 0xa3500000, rq_addr_lsb = 0x16000
/home/niexun/RecoNIC/lib/rdma_api.c:614:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDi=0x60328, qpid=2, value=0x2c00000
/home/niexun/RecoNIC/lib/rdma_api.c:621:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDMSBi=0x6032c, qpid=2, value=0xf
/home/niexun/RecoNIC/lib/rdma_api.c:625:allocate_rdma_qp(): DEBUG: cq_cidb_addr = 0xf02c00000
/home/niexun/RecoNIC/lib/rdma_api.c:631:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDi=0x60320, qpid=2, value=0x2c00020
/home/niexun/RecoNIC/lib/rdma_api.c:638:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDMSBi=0x60324, qpid=2, value=0xf
/home/niexun/RecoNIC/lib/rdma_api.c:642:allocate_rdma_qp(): DEBUG: rq_cidb_addr = 0xf02c00020
/home/niexun/RecoNIC/lib/rdma_api.c:648:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_DESTQPCONFi=0x60348, qpid=2, value=0x2
/home/niexun/RecoNIC/lib/rdma_api.c:657:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QDEPTHi=0x6033c, qpid=2, value=0x400040
/home/niexun/RecoNIC/lib/rdma_api.c:704:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPCONFi=0x60300, qpid=2, value=0x200043d
/home/niexun/RecoNIC/lib/rdma_api.c:718:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPADVCONFi=0x60304, qpid=2, value=0x12344000
/home/niexun/RecoNIC/lib/rdma_api.c:727:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_PDi=0x603b0, qpid=2, value=0x0
Info: allocate_rdma_qp - Successfully allocated a rdma qp
Info: CONFIGURE PSN
[Register] RN_RDMA_QCSR_LSTRQREQi=0x60344, qpid=2, value=0xa000abc
[Register] RN_RDMA_QCSR_SQPSNi=0x60340, qpid=2, value=0xabd
payload_size = 128, payload_size>>2 = 32
Info: Client is connecting to a remote server
Info: Client is connected to a remote server
Info: client received remote offset of A = 0xa3500000000b6000
/home/niexun/RecoNIC/lib/reconic.c:251:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a3500000000b6000, rn_dev->dev_buffer_offset = 0xb6080
/home/niexun/RecoNIC/lib/reconic.c:253:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Info: creating an RDMA read WQE for getting data
/home/niexun/RecoNIC/lib/rdma_api.c:769:create_a_wqe(): Info: WQE mem_buffer = 0xa3500000000b6000, masked_mem_buffer = 0xa3500000000b6000
/home/niexun/RecoNIC/lib/rdma_api.c:796:create_a_wqe(): [WQE] wrid=0x0
/home/niexun/RecoNIC/lib/rdma_api.c:797:create_a_wqe(): [WQE] laddr_low=0xb6000
/home/niexun/RecoNIC/lib/rdma_api.c:798:create_a_wqe(): [WQE] laddr_high=0xa3500000
/home/niexun/RecoNIC/lib/rdma_api.c:799:create_a_wqe(): [WQE] length=0x80
/home/niexun/RecoNIC/lib/rdma_api.c:800:create_a_wqe(): [WQE] opcode=0x4
/home/niexun/RecoNIC/lib/rdma_api.c:801:create_a_wqe(): [WQE] remote_offset_low=0xb6000
/home/niexun/RecoNIC/lib/rdma_api.c:802:create_a_wqe(): [WQE] remote_offset_high=0xa3500000
/home/niexun/RecoNIC/lib/rdma_api.c:803:create_a_wqe(): [WQE] r_key=0x8
/home/niexun/RecoNIC/lib/rdma_api.c:804:create_a_wqe(): [WQE] send_small_payload0=0x0
/home/niexun/RecoNIC/lib/rdma_api.c:805:create_a_wqe(): [WQE] send_small_payload1=0x0
/home/niexun/RecoNIC/lib/rdma_api.c:806:create_a_wqe(): [WQE] send_small_payload2=0x0
/home/niexun/RecoNIC/lib/rdma_api.c:807:create_a_wqe(): [WQE] send_small_payload3=0x0
/home/niexun/RecoNIC/lib/rdma_api.c:808:create_a_wqe(): [WQE] immdt_data=0x0
/home/niexun/RecoNIC/lib/rdma_api.c:811:create_a_wqe(): DEBUG: Write WQE to the device memory
/home/niexun/RecoNIC/lib/rdma_api.c:817:create_a_wqe(): DEBUG: successfully write WQE to the device memory!
/home/niexun/RecoNIC/lib/rdma_api.c:875:rdma_post_send(): DEBUG: Reading hardware SQPIi (0x60338) = 0x0
/home/niexun/RecoNIC/lib/rdma_api.c:876:rdma_post_send(): DEBUG: original qp->sq_pidb = 0x0
/home/niexun/RecoNIC/lib/rdma_api.c:882:rdma_post_send(): [Register] RN_RDMA_QCSR_SQPIi=0x60338, qpid=2, value=0x1
/home/niexun/RecoNIC/lib/rdma_api.c:883:rdma_post_send(): DEBUG: Update hardware sq db idx from software = 1
/home/niexun/RecoNIC/lib/rdma_api.c:884:rdma_post_send(): DEBUG: Reading hardware SQPIi (0x60338) = 0x1
/home/niexun/RecoNIC/lib/rdma_api.c:844:poll_cq_cidb(): [Register] RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x1
/home/niexun/RecoNIC/lib/rdma_api.c:846:poll_cq_cidb(): DEBUG: before polling: sq_cidb = 0; Polling CQ CIDB = 1
/home/niexun/RecoNIC/lib/rdma_api.c:857:poll_cq_cidb(): DEBUG: after polling: sq_cidb = 0; Polling CQ CIDB = 1
Successfully sent an RDMA read operation
Info: Dump register values for debug purpose
Info: [RN_RDMA_GCSR_ERRBUFWPTR      = 0x6006c] = 0x0
Info: [RN_RDMA_GCSR_IPKTERRQWPTR    = 0x60094] = 0x1
Info: [RN_RDMA_GCSR_INSRRPKTCNT     = 0x60100] = 0x0
Info: [RN_RDMA_GCSR_INAMPKTCNT      = 0x60104] = 0x0
Info: [RN_RDMA_GCSR_OUTIOPKTCNT     = 0x60108] = 0x0
Info: [RN_RDMA_GCSR_OUTAMPKTCNT     = 0x6010c] = 0x0
Info: [RN_RDMA_GCSR_LSTINPKT        = 0x60110] = 0x2
Info: [RN_RDMA_GCSR_LSTOUTPKT       = 0x60114] = 0x15782e3
Info: [RN_RDMA_GCSR_ININVDUPCNT     = 0x60118] = 0x0
Info: [RN_RDMA_GCSR_INNCKPKTSTS     = 0x6011c] = 0x0
Info: [RN_RDMA_GCSR_OUTRNRPKTSTS    = 0x60120] = 0x0
Info: [RN_RDMA_GCSR_WQEPROCSTS      = 0x60124] = 0x2040
Info: [RN_RDMA_GCSR_QPMSTS          = 0x6012c] = 0x10002
Info: [RN_RDMA_GCSR_INALLDRPPKTCNT  = 0x60130] = 0x0
Info: [RN_RDMA_GCSR_INNAKPKTCNT     = 0x60134] = 0x0
Info: [RN_RDMA_GCSR_OUTNAKPKTCNT    = 0x60138] = 0x0
Info: [RN_RDMA_GCSR_RESPHNDSTS      = 0x6013c] = 0x10002
Info: [RN_RDMA_GCSR_RETRYCNTSTS     = 0x60140] = 0x0
Info: [RN_RDMA_GCSR_INCNPPKTCNT     = 0x60174] = 0x0
Info: [RN_RDMA_GCSR_OUTCNPPKTCNT    = 0x60178] = 0x0
Info: [RN_RDMA_GCSR_OUTRDRSPPKTCNT  = 0x6017c] = 0x0
Info: [RN_RDMA_GCSR_INTSTS          = 0x60184] = 0x90
Info: [RN_RDMA_GCSR_RQINTSTS1       = 0x60190] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS2       = 0x60194] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS3       = 0x60198] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS4       = 0x6019c] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS5       = 0x601a0] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS6       = 0x601a4] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS7       = 0x601a8] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS8       = 0x601ac] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS1       = 0x601b0] = 0x4
Info: [RN_RDMA_GCSR_CQINTSTS2       = 0x601b4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS3       = 0x601b8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS4       = 0x601bc] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS5       = 0x601c0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS6       = 0x601c4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS7       = 0x601c8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS8       = 0x601cc] = 0x0
Info: [RN_RDMA_QCSR_CQHEADi         = 0x60330] = 0x1
Info: [RN_RDMA_QCSR_STATSSNi        = 0x60380] = 0x1
Info: [RN_RDMA_QCSR_STATMSNi        = 0x60384] = 0x0
Info: [RN_RDMA_QCSR_STATQPi         = 0x60388] = 0x601
Info: [RN_RDMA_QCSR_STATCURSQPTRi   = 0x6038c] = 0x1
Info: [RN_RDMA_QCSR_STATRESPSNi     = 0x60390] = 0xabc
Info: [RN_RDMA_QCSR_STATRQBUFCAi    = 0x60394] = 0x16000
Info: [RN_RDMA_QCSR_STATWQEi        = 0x60398] = 0x0
Info: [RN_RDMA_QCSR_STATRQPIDBi     = 0x6039c] = 0x0
Info: [RN_RDMA_QCSR_STATRQBUFCAMSBi = 0x603d8] = 0xa3500000
Info: [RN_RDMA_QCSR_SQPIi           = 0x60338] = 0x1

Info: All data has been received!
Info: buffer physical address is 0xa3500000000b6000
Info: Time spent 6.836000 usec, size = 128 bytes, Bandwidth = 0.149795 gigabits/sec
Info: The value of rc is 128
Info: CHECK RECEIVED DATA
Error: received data mismatched: recv[0]=-2026415136, sw_golden[0]=0
Warning: QP in fatal status

***** QP2 FATAL RECOVERY *****
/home/niexun/RecoNIC/lib/rdma_api.c:1029:rdma_qp_fatal_recovery(): [Register] RN_RDMA_QCSR_QPCONFi=0x60300, qpid=2, value=0x200047c
/home/niexun/RecoNIC/lib/rdma_api.c:1082:destroy_rdma_qp(): [DEBUG] Destroying dev: 0x7f3186d86000, RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0
/home/niexun/RecoNIC/lib/rdma_api.c:1087:destroy_rdma_qp(): [DEBUG] Destroying dev: 0x7f3186d86000, RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0

After run rdma read test on the host:

➜  rdma_test git:(main) ✗ sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./read -r 192.100.52.1 -i 192.100.51.1 -p /sys/bus/pci/devices/0000:04:00.0/resource2 -z 128 -l dev_mem -d /dev/reconic-mm -s -u 22222 -t 11111 --dst_qp 2 -g 2>&1 | tee server_debug.log
src_ip_str = 192.100.52.1
dst_ip_str = 192.100.51.1
Info: PCIe resource file: /sys/bus/pci/devices/0000:04:00.0/resource2
Info: QP allocated at: dev_mem
Info: Device - /dev/reconic-mm
Info: src_ip = 192.100.52.1
Info: Found network interface: enp4s0
Info: mac_addr_t = 00:0a:35:94:71:bf
Info: Creating rn_dev
/home/antl/RecoNIC/lib/reconic.c:296:create_rn_dev(): Info: scr(=4)) file open successfully
create_rn_dev - testing2
/home/antl/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0xf20600
/home/antl/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/home/antl/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0xf20600000
Info: pre-allocated hugepage buffer vir addr = 0x7f3060c00000, physical addr = 0xf20600000
Info: Configuring QDMA AXI bridge BDF
/home/antl/RecoNIC/lib/reconic.c:194:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16420, bdf_addr_low=0x0
/home/antl/RecoNIC/lib/reconic.c:195:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16424, bdf_addr_high=0x0
/home/antl/RecoNIC/lib/reconic.c:196:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16430, bdf_win_config=0xc2000000
Info: CREATE RDMA DEVICE
/home/antl/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0xf20600
/home/antl/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/home/antl/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0xf20600000
/home/antl/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f3060c00000, physical addr = f20600000, rn_dev->buffer_offset = 0x200000
/home/antl/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/home/antl/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0xf20400
/home/antl/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/home/antl/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0xf20400000
/home/antl/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f3060e00000, physical addr = f20400000, rn_dev->buffer_offset = 0x1200000
/home/antl/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/home/antl/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0xf21400
/home/antl/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/home/antl/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0xf21400000
/home/antl/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f3061e00000, physical addr = f21400000, rn_dev->buffer_offset = 0x1202000
/home/antl/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/home/antl/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0xf21402
/home/antl/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/home/antl/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0xf21402000
/home/antl/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f3061e02000, physical addr = f21402000, rn_dev->buffer_offset = 0x1212000
/home/antl/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/home/antl/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0xf21412
/home/antl/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/home/antl/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0xf21412000
/home/antl/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f3061e12000, physical addr = f21412000, rn_dev->buffer_offset = 0x1222000
/home/antl/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: OPEN RDMA DEVICE
/home/antl/RecoNIC/lib/rdma_api.c:186:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBA=0x600a0, value=0x20400000
/home/antl/RecoNIC/lib/rdma_api.c:188:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBAMSB=0x600a4, value=0xf
/home/antl/RecoNIC/lib/rdma_api.c:190:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFSZ=0x600a8, value=0x10001000
/home/antl/RecoNIC/lib/rdma_api.c:193:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBA=0x60088, value=0x21400000
/home/antl/RecoNIC/lib/rdma_api.c:195:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBAMSB=0x6008c, value=0xf
/home/antl/RecoNIC/lib/rdma_api.c:197:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60090, value=0x2000
/home/antl/RecoNIC/lib/rdma_api.c:200:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBA=0x60060, value=0x21402000
/home/antl/RecoNIC/lib/rdma_api.c:202:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBAMSB=0x60064, value=0xf
/home/antl/RecoNIC/lib/rdma_api.c:204:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60068, value=0x1000100
/home/antl/RecoNIC/lib/rdma_api.c:207:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBA=0x600b0, value=0x21412000
/home/antl/RecoNIC/lib/rdma_api.c:209:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBAMSB=0x600b4, value=0xf
/home/antl/RecoNIC/lib/rdma_api.c:211:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZ=0x600b8, value=0x10000
/home/antl/RecoNIC/lib/rdma_api.c:213:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZMSB=0x600bc, value=0x0
/home/antl/RecoNIC/lib/rdma_api.c:217:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_INTEN=0x60180, value=0xff
/home/antl/RecoNIC/lib/rdma_api.c:221:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDLSB=0x60010, value=0x359471bf
/home/antl/RecoNIC/lib/rdma_api.c:223:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDMSB=0x60014, value=0xa
/home/antl/RecoNIC/lib/rdma_api.c:227:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPV4XADD=0x60070, value=0xc0643401
/home/antl/RecoNIC/lib/rdma_api.c:230:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICCONF=0x60000, value=0x56ce1421
/home/antl/RecoNIC/lib/rdma_api.c:233:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICADCONF=0x60004, value=0xa0004
Info: RDMA global control status registers are configured.
Info: rdma_dev opened
Info: ALLOCATE PD
/home/antl/RecoNIC/lib/rdma_api.c:254:allocate_rdma_pd(): [Register] RN_RDMA_PDT_PDPDNUM=0x40000, pd_num=0, value=0x0
Info: OPEN DEVICE FILE
Info: ALLOCATE RDMA QP
Allocating qp->sq
/home/antl/RecoNIC/lib/rdma_api.c:437:allocate_rdma_qp(): sq_size = 81920, cq_size = 5120, rq_size 655360, buf_location = dev_mem
/home/antl/RecoNIC/lib/reconic.c:251:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000000000, rn_dev->dev_buffer_offset = 0x14000
/home/antl/RecoNIC/lib/reconic.c:253:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Allocating qp->cq
/home/antl/RecoNIC/lib/reconic.c:251:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000014000, rn_dev->dev_buffer_offset = 0x15400
/home/antl/RecoNIC/lib/reconic.c:253:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Allocating qp->rq
/home/antl/RecoNIC/lib/reconic.c:251:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000016000, rn_dev->dev_buffer_offset = 0xb6000
/home/antl/RecoNIC/lib/reconic.c:253:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Info: queue pair setting is done! Configuring RDMA per-queu CSR registers
/home/antl/RecoNIC/lib/rdma_api.c:485:allocate_rdma_qp(): DEBUG: rdma_dev->rn_dev->axil_ctl = 0x7f3080c59000, rdma_dev->axil_ctl = 0x7f3080c59000
/home/antl/RecoNIC/lib/rdma_api.c:499:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_IPDESADDR1i=0x60360, qpid=2, value=0xc0643301
/home/antl/RecoNIC/lib/rdma_api.c:506:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDLSBi=0x60350, qpid=2, value=0x0
/home/antl/RecoNIC/lib/rdma_api.c:513:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDMSBi=0x60354, qpid=2, value=0x0
/home/antl/RecoNIC/lib/rdma_api.c:519:allocate_rdma_qp(): DEBUG: win_size_high = 0x1f, win_size_low = 0xffffffff
/home/antl/RecoNIC/lib/rdma_api.c:536:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAi=0x60310, qpid=2, value=0x0
/home/antl/RecoNIC/lib/rdma_api.c:543:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAMSBi=0x603c8, qpid=2, value=0xa3500000
/home/antl/RecoNIC/lib/rdma_api.c:547:allocate_rdma_qp(): DEBUG: qp->sq->dma_addr = 0xa350000000000000, sq_addr_msb = 0xa3500000, sq_addr_lsb = 0x0
/home/antl/RecoNIC/lib/rdma_api.c:565:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAi=0x60318, qpid=2, value=0x14000
/home/antl/RecoNIC/lib/rdma_api.c:572:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAMSBi=0x603d0, qpid=2, value=0xa3500000
/home/antl/RecoNIC/lib/rdma_api.c:576:allocate_rdma_qp(): DEBUG: qp->cq->dma_addr = 0xa350000000014000, cq_addr_msb = 0xa3500000, cq_addr_lsb = 0x14000
/home/antl/RecoNIC/lib/rdma_api.c:594:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAi=0x60308, qpid=2, value=0x16000
/home/antl/RecoNIC/lib/rdma_api.c:601:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAMSBi=0x603c0, qpid=2, value=0xa3500000
/home/antl/RecoNIC/lib/rdma_api.c:605:allocate_rdma_qp(): DEBUG: qp->rq->dma_addr = 0xa350000000016000, rq_addr_msb = 0xa3500000, rq_addr_lsb = 0x16000
/home/antl/RecoNIC/lib/rdma_api.c:614:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDi=0x60328, qpid=2, value=0x20600000
/home/antl/RecoNIC/lib/rdma_api.c:621:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDMSBi=0x6032c, qpid=2, value=0xf
/home/antl/RecoNIC/lib/rdma_api.c:625:allocate_rdma_qp(): DEBUG: cq_cidb_addr = 0xf20600000
/home/antl/RecoNIC/lib/rdma_api.c:631:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDi=0x60320, qpid=2, value=0x20600020
/home/antl/RecoNIC/lib/rdma_api.c:638:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDMSBi=0x60324, qpid=2, value=0xf
/home/antl/RecoNIC/lib/rdma_api.c:642:allocate_rdma_qp(): DEBUG: rq_cidb_addr = 0xf20600020
/home/antl/RecoNIC/lib/rdma_api.c:648:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_DESTQPCONFi=0x60348, qpid=2, value=0x2
/home/antl/RecoNIC/lib/rdma_api.c:657:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QDEPTHi=0x6033c, qpid=2, value=0x400040
/home/antl/RecoNIC/lib/rdma_api.c:704:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPCONFi=0x60300, qpid=2, value=0x200043d
/home/antl/RecoNIC/lib/rdma_api.c:718:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPADVCONFi=0x60304, qpid=2, value=0x12344000
/home/antl/RecoNIC/lib/rdma_api.c:727:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_PDi=0x603b0, qpid=2, value=0x0
Info: allocate_rdma_qp - Successfully allocated a rdma qp
Info: CONFIGURE PSN
[Register] RN_RDMA_QCSR_LSTRQREQi=0x60344, qpid=2, value=0xa000abc
[Register] RN_RDMA_QCSR_SQPSNi=0x60340, qpid=2, value=0xabd
payload_size = 128, payload_size>>2 = 32
Info: Server is listening to a remote peer
Info: Server is connected to a remote peer
/home/antl/RecoNIC/lib/reconic.c:251:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a3500000000b6000, rn_dev->dev_buffer_offset = 0xb6080
/home/antl/RecoNIC/lib/reconic.c:253:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Info: rdma_register_memory_region - registering memory region
/home/antl/RecoNIC/lib/rdma_api.c:316:rdma_register_memory_region(): [Register] RN_RDMA_PDT_VIRTADDRLSB=0x40004, pd_num=0, value=0xb6000
/home/antl/RecoNIC/lib/rdma_api.c:318:rdma_register_memory_region(): [Register] RN_RDMA_PDT_VIRTADDRMSB=0x40008, pd_num=0, value=0xa3500000
/home/antl/RecoNIC/lib/rdma_api.c:320:rdma_register_memory_region(): [Register] RN_RDMA_PDT_BUFBASEADDRLSB=0x4000c, pd_num=0, value=0xb6000
/home/antl/RecoNIC/lib/rdma_api.c:322:rdma_register_memory_region(): [Register] RN_RDMA_PDT_BUFBASEADDRMSB=0x40010, pd_num=0, value=0xa3500000
/home/antl/RecoNIC/lib/rdma_api.c:324:rdma_register_memory_region(): [Register] RN_RDMA_PDT_BUFRKEY=0x40014, pd_num=0, value=0x8
/home/antl/RecoNIC/lib/rdma_api.c:327:rdma_register_memory_region(): [Register] RN_RDMA_PDT_WRRDBUFLEN=0x40018, pd_num=0, value=0x80 B
/home/antl/RecoNIC/lib/rdma_api.c:330:rdma_register_memory_region(): [Register] RN_RDMA_PDT_ACCESSDESC=0x4001c, pd_num=0, value=0x2
Info: memory region for the 0-th PD is registered
Info: allocating buffer for payload data
Info: tmp_buffer->buffer = 0xa3500000000b6000, tmp_buffer->dma_addr = 0xa3500000000b6000
Info: copy payload data to the device memory
Info: copied payload data to the device memory succesfully rc = 128
Sending read_offset (a3500000000b6000) to the remote client
Does the client finish its RDMA read operation? If yes, please press any key

Info: Dump register values for debug purpose
Info: [RN_RDMA_GCSR_ERRBUFWPTR      = 0x6006c] = 0x2
Info: [RN_RDMA_GCSR_IPKTERRQWPTR    = 0x60094] = 0x0
Info: [RN_RDMA_GCSR_INSRRPKTCNT     = 0x60100] = 0x0
Info: [RN_RDMA_GCSR_INAMPKTCNT      = 0x60104] = 0x0
Info: [RN_RDMA_GCSR_OUTIOPKTCNT     = 0x60108] = 0x0
Info: [RN_RDMA_GCSR_OUTAMPKTCNT     = 0x6010c] = 0x0
Info: [RN_RDMA_GCSR_LSTINPKT        = 0x60110] = 0x0
Info: [RN_RDMA_GCSR_LSTOUTPKT       = 0x60114] = 0x0
Info: [RN_RDMA_GCSR_ININVDUPCNT     = 0x60118] = 0x2
Info: [RN_RDMA_GCSR_INNCKPKTSTS     = 0x6011c] = 0x0
Info: [RN_RDMA_GCSR_OUTRNRPKTSTS    = 0x60120] = 0x0
Info: [RN_RDMA_GCSR_WQEPROCSTS      = 0x60124] = 0x2040
Info: [RN_RDMA_GCSR_QPMSTS          = 0x6012c] = 0x2
Info: [RN_RDMA_GCSR_INALLDRPPKTCNT  = 0x60130] = 0x20000
Info: [RN_RDMA_GCSR_INNAKPKTCNT     = 0x60134] = 0x0
Info: [RN_RDMA_GCSR_OUTNAKPKTCNT    = 0x60138] = 0x0
Info: [RN_RDMA_GCSR_RESPHNDSTS      = 0x6013c] = 0x10000
Info: [RN_RDMA_GCSR_RETRYCNTSTS     = 0x60140] = 0x0
Info: [RN_RDMA_GCSR_INCNPPKTCNT     = 0x60174] = 0x0
Info: [RN_RDMA_GCSR_OUTCNPPKTCNT    = 0x60178] = 0x0
Info: [RN_RDMA_GCSR_OUTRDRSPPKTCNT  = 0x6017c] = 0x0
Info: [RN_RDMA_GCSR_INTSTS          = 0x60184] = 0x1
Info: [RN_RDMA_GCSR_RQINTSTS1       = 0x60190] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS2       = 0x60194] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS3       = 0x60198] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS4       = 0x6019c] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS5       = 0x601a0] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS6       = 0x601a4] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS7       = 0x601a8] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS8       = 0x601ac] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS1       = 0x601b0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS2       = 0x601b4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS3       = 0x601b8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS4       = 0x601bc] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS5       = 0x601c0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS6       = 0x601c4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS7       = 0x601c8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS8       = 0x601cc] = 0x0
Info: [RN_RDMA_QCSR_CQHEADi         = 0x60330] = 0x0
Info: [RN_RDMA_QCSR_STATSSNi        = 0x60380] = 0x0
Info: [RN_RDMA_QCSR_STATMSNi        = 0x60384] = 0x0
Info: [RN_RDMA_QCSR_STATQPi         = 0x60388] = 0x600
Info: [RN_RDMA_QCSR_STATCURSQPTRi   = 0x6038c] = 0x0
Info: [RN_RDMA_QCSR_STATRESPSNi     = 0x60390] = 0xabc
Info: [RN_RDMA_QCSR_STATRQBUFCAi    = 0x60394] = 0x16000
Info: [RN_RDMA_QCSR_STATWQEi        = 0x60398] = 0x0
Info: [RN_RDMA_QCSR_STATRQPIDBi     = 0x6039c] = 0x0
Info: [RN_RDMA_QCSR_STATRQBUFCAMSBi = 0x603d8] = 0xa3500000

/home/antl/RecoNIC/lib/rdma_api.c:1082:destroy_rdma_qp(): [DEBUG] Destroying dev: 0x7f3080c59000, RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0
/home/antl/RecoNIC/lib/rdma_api.c:1087:destroy_rdma_qp(): [DEBUG] Destroying dev: 0x7f3080c59000, RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0
/home/antl/RecoNIC/lib/rdma_api.c:1095:destroy_rdma_qp(): [DEBUG] Destroying dev: 0x7f3080c59000, RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0
[1]    20813 segmentation fault  sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./read -r 192.100.52.1 -i  -p  -z   | 
       20814 done                tee server_debug.log

Really appreciate for your guidance! Xun Nie

zhguanw-amd commented 11 months ago

@niexun725 Hi Xun Nie,

The dmesg output seems okay.

Besides, After use ila, for the box_250mhz module of host, the"a_axis_adap_rx_250mhz"interface, it indeed has some input packets. However, "m_axis_user2rdma_roce_from_cmacrx " interface nerver asserts. This might means that no packets are identified as rdma traffic on the host.I think packet classification doesn't work, but I indeed have a full license of vitis_network_p4.

You didn't see the packets via ILA doesn't mean the remote peer (the server node) didn't receive it. This is because your ILA just missed the packets somehow when sampling. The issue should not be the packet classification. The reason is that your server node's [RN_RDMA_GCSR_ERRBUFWPTR = 0x6006c] = 0x2, which means the server have received two RDMA packets. (It should be '1' if you run the read test only once after fresh-program. I guess probably you ran it twice somehow. Anyway, this is not important). At the server side, what you can do is either to print first entry of the error buffer at the host memory or use ILA to check this signal "axi_rdma_send_write_payload_wdata". The first 32-bit is the error syndrome, which you can find the definition in Table 2-3 from ERNIC PG332 v3.1. You should check that first to understand why your incoming RDMA packet has the issue.

The reason why you have "[RN_RDMA_GCSR_IPKTERRQWPTR = 0x60094] = 0x1" at your client side should be that your server replies back nak read response signal.

Thanks, Guanwen

niexun725 commented 11 months ago

@zhguanw-amd

The reason why you have "[RN_RDMA_GCSR_IPKTERRQWPTR = 0x60094] = 0x1" at your client side should be that your server replies back nak read response signal.

Dear Zhguanw, Finally, I successfully ported the project to the VCU128 FPGA, thank you very much for your guidance.

The problem lies in the failure to obtain the destination MAC address when running test samples such as rdma write, rdma read, etc.

I run the test sample according to the steps and check the captured RDMA frames. I find that all bits of the destination MAC address are zeros, and ernic will not work properly if this value is wrong.

I found that all these tests failed to read the destination mac address correctly. However, I used ' arp-a ' to view the ARP table of the linux system, I can indeed find the destination MAC address of the NIC, so I manually configured the MAC address. After that, these tests were successfully performed.

I think there are some problems with the MAC address obtain program in write.c and read.c, and I will further check later. Besides, I will try to use HBM to replace DDR4 in RecoNIC.

Thanks, Xun Nie

zhguanw-amd commented 11 months ago

Hi Xun Nie,

Great to hear that it finally works at your end.

I think there are some problems with the MAC address obtain program in write.c and read.c, and I will further check later.

I have no idea. We haven't encountered this issue and other groups using it also haven't reported any similar issues.

Besides, I will try to use HBM to replace DDR4 in RecoNIC.

That would be nice.

Would you mind to push your changes of VCU128 porting to the RecoNIC repository when it's ready?

Thanks, Guanwen

niexun725 commented 11 months ago

@zhguanw-amd

Would you mind to push your changes of VCU128 porting to the RecoNIC repository when it's ready?

Dear Zhguanw,

Thank you for your invitation,I 'd be very happy if I could get involved. I will push my changes of VCU128 porting to the RecoNIC repository when it's ready.

Thanks, Xun Nie

zhguanw-amd commented 10 months ago

@niexun725

Great! Thanks Xun Nie! I'm going to close this issue, as you have solved it. Feel free to re-open it if you think it's necessary.

Thanks, Guanwen