Xilinx / RecoNIC

RecoNIC is a software/hardware shell used to enable network-attached processing within an RDMA-featured SmartNIC for scale-out computing.
MIT License
106 stars 27 forks source link

Error: failed to lock page in memory #19

Closed qianyich closed 7 months ago

qianyich commented 8 months ago

The system is up and running. I can ping server from client side and ping client from server side.

qianyich@pc164:~/RecoNIC/drivers/onic-driver$ ping 192.100.52.1
PING 192.100.52.1 (192.100.52.1) 56(84) bytes of data.
64 bytes from 192.100.52.1: icmp_seq=1 ttl=64 time=0.238 ms
64 bytes from 192.100.52.1: icmp_seq=2 ttl=64 time=0.092 ms
64 bytes from 192.100.52.1: icmp_seq=3 ttl=64 time=0.164 ms
64 bytes from 192.100.52.1: icmp_seq=4 ttl=64 time=0.166 ms
64 bytes from 192.100.52.1: icmp_seq=5 ttl=64 time=0.140 ms
64 bytes from 192.100.52.1: icmp_seq=6 ttl=64 time=0.104 ms
64 bytes from 192.100.52.1: icmp_seq=7 ttl=64 time=0.126 ms
64 bytes from 192.100.52.1: icmp_seq=8 ttl=64 time=0.115 ms
64 bytes from 192.100.52.1: icmp_seq=9 ttl=64 time=0.108 ms
^C
--- 192.100.52.1 ping statistics ---
9 packets transmitted, 9 received, 0% packet loss, time 8193ms
rtt min/avg/max/mdev = 0.092/0.139/0.238/0.043 ms
qianyich@pc166:~/RecoNIC/drivers/onic-driver$ ping 192.100.51.1
PING 192.100.51.1 (192.100.51.1) 56(84) bytes of data.
64 bytes from 192.100.51.1: icmp_seq=1 ttl=64 time=0.167 ms
64 bytes from 192.100.51.1: icmp_seq=2 ttl=64 time=0.152 ms
64 bytes from 192.100.51.1: icmp_seq=3 ttl=64 time=0.154 ms
64 bytes from 192.100.51.1: icmp_seq=4 ttl=64 time=0.161 ms
64 bytes from 192.100.51.1: icmp_seq=5 ttl=64 time=0.154 ms
64 bytes from 192.100.51.1: icmp_seq=6 ttl=64 time=0.155 ms
^C
--- 192.100.51.1 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5110ms
rtt min/avg/max/mdev = 0.152/0.157/0.167/0.008 ms

When I was trying to run rdma_test read and write, I have the following error.

qianyich@pc164:~/RecoNIC/examples/rdma_test$ sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./write -r 192.100.51.1 -i 192.100.52.1 -p /sys/bus/pci/devices/0000\:3b\:00.0/resource2 -z 128 -l host_mem -d /dev/reconi
c-mm -c -u 22222 -t 11111 --dst_qp 2 -g 2>&1 | tee client_debug.log
src_ip_str = 192.100.51.1
dst_ip_str = 192.100.52.1
Info: mac_addr_t = 00:0a:35:fd:c0:a8
Info: PCIe resource file: /sys/bus/pci/devices/0000:3b:00.0/resource2
Info: QP allocated at: host_mem
Info: Device - /dev/reconic-mm
Info: src_ip = 192.100.51.1
Info: Found network interface: enp59s0
Info: mac_addr_t = 00:0a:35:c0:8c:15
Info: Creating rn_dev
/users/qianyich/RecoNIC/lib/reconic.c:296:create_rn_dev(): Info: scr(=4)) file open successfully
create_rn_dev - testing2
Error: failed to lock page in memory

I found this in lib/reconic.c:323. Is this due to insufficient huge page? I guess I need to enable and configure the number of huge pages in Linux. How many huge pages do I need?

Currently:

HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

After configure the hugepage number to 1024. I have the following error:

qianyich@pc164:~/RecoNIC/examples/rdma_test$ sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./write -r 192.100.51.1 -i 192.100.52.1 -p /sys/bus/pci/devices/0000\:3b\:00.0/resource2 -z 128 -l dev_mem -d /dev/reconic-mm -c -u 22222 -t 11111 --dst_qp 2 -g 2>&1 | tee client_debug.log
src_ip_str = 192.100.51.1
dst_ip_str = 192.100.52.1
Info: mac_addr_t = 00:0a:35:fd:c0:a8
Info: PCIe resource file: /sys/bus/pci/devices/0000:3b:00.0/resource2
Info: QP allocated at: dev_mem
Info: Device - /dev/reconic-mm
Info: src_ip = 192.100.51.1
Info: Found network interface: enp59s0
Info: mac_addr_t = 00:0a:35:c0:8c:15
Info: Creating rn_dev
/users/qianyich/RecoNIC/lib/reconic.c:296:create_rn_dev(): Info: scr(=4)) file open successfully
create_rn_dev - testing2
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f54e00
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f54e00000
Info: pre-allocated hugepage buffer vir addr = 0x7f0b0aa00000, physical addr = 0x2f54e00000
Info: Configuring QDMA AXI bridge BDF
/users/qianyich/RecoNIC/lib/reconic.c:194:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16420, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:195:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16424, bdf_addr_high=0x20
/users/qianyich/RecoNIC/lib/reconic.c:196:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16430, bdf_win_config=0xc2000000
Info: CREATE RDMA DEVICE
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f54e00
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f54e00000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f0b0aa00000, physical addr = 2f54e00000, rn_dev->buffer_offset = 0x200000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f54c00
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f54c00000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f0b0ac00000, physical addr = 2f54c00000, rn_dev->buffer_offset = 0x1200000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f55c00
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f55c00000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f0b0bc00000, physical addr = 2f55c00000, rn_dev->buffer_offset = 0x1202000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f55c02
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f55c02000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f0b0bc02000, physical addr = 2f55c02000, rn_dev->buffer_offset = 0x1212000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f55c12
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f55c12000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f0b0bc12000, physical addr = 2f55c12000, rn_dev->buffer_offset = 0x1222000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: OPEN RDMA DEVICE
/users/qianyich/RecoNIC/lib/rdma_api.c:186:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBA=0x600a0, value=0x54c00000
/users/qianyich/RecoNIC/lib/rdma_api.c:188:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBAMSB=0x600a4, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:190:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFSZ=0x600a8, value=0x10001000
/users/qianyich/RecoNIC/lib/rdma_api.c:193:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBA=0x60088, value=0x55c00000
/users/qianyich/RecoNIC/lib/rdma_api.c:195:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBAMSB=0x6008c, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:197:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60090, value=0x2000
/users/qianyich/RecoNIC/lib/rdma_api.c:200:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBA=0x60060, value=0x55c02000
/users/qianyich/RecoNIC/lib/rdma_api.c:202:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBAMSB=0x60064, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:204:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60068, value=0x1000100
/users/qianyich/RecoNIC/lib/rdma_api.c:207:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBA=0x600b0, value=0x55c12000
/users/qianyich/RecoNIC/lib/rdma_api.c:209:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBAMSB=0x600b4, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:211:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZ=0x600b8, value=0x10000
/users/qianyich/RecoNIC/lib/rdma_api.c:213:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZMSB=0x600bc, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:217:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_INTEN=0x60180, value=0xff
/users/qianyich/RecoNIC/lib/rdma_api.c:221:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDLSB=0x60010, value=0x35c08c15
/users/qianyich/RecoNIC/lib/rdma_api.c:223:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDMSB=0x60014, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:227:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPV4XADD=0x60070, value=0xc0643301
/users/qianyich/RecoNIC/lib/rdma_api.c:230:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICCONF=0x60000, value=0x56ce1421
/users/qianyich/RecoNIC/lib/rdma_api.c:233:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICADCONF=0x60004, value=0xa0004
Info: RDMA global control status registers are configured.
Info: rdma_dev opened
Info: ALLOCATE PD
/users/qianyich/RecoNIC/lib/rdma_api.c:254:allocate_rdma_pd(): [Register] RN_RDMA_PDT_PDPDNUM=0x40000, pd_num=0, value=0x0
Info: OPEN DEVICE FILE
Info: ALLOCATE RDMA QP
Allocating qp->sq
/users/qianyich/RecoNIC/lib/rdma_api.c:437:allocate_rdma_qp(): sq_size = 81920, cq_size = 5120, rq_size 655360, buf_location = dev_mem
/users/qianyich/RecoNIC/lib/reconic.c:251:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000000000, rn_dev->dev_buffer_offset = 0x14000
/users/qianyich/RecoNIC/lib/reconic.c:253:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Allocating qp->cq
/users/qianyich/RecoNIC/lib/reconic.c:251:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000014000, rn_dev->dev_buffer_offset = 0x15400
/users/qianyich/RecoNIC/lib/reconic.c:253:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Allocating qp->rq
/users/qianyich/RecoNIC/lib/reconic.c:251:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000016000, rn_dev->dev_buffer_offset = 0xb6000
/users/qianyich/RecoNIC/lib/reconic.c:253:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Info: queue pair setting is done! Configuring RDMA per-queu CSR registers
/users/qianyich/RecoNIC/lib/rdma_api.c:487:allocate_rdma_qp(): DEBUG: rdma_dev->rn_dev->axil_ctl = 0x7f0b2aa4c000, rdma_dev->axil_ctl = 0x7f0b2aa4c000
/users/qianyich/RecoNIC/lib/rdma_api.c:502:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_IPDESADDR1i=0x60360, qpid=2, value=0xc0643401
/users/qianyich/RecoNIC/lib/rdma_api.c:509:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDLSBi=0x60350, qpid=2, value=0x35fdc0a8
/users/qianyich/RecoNIC/lib/rdma_api.c:516:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDMSBi=0x60354, qpid=2, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:521:allocate_rdma_qp(): DEBUG: win_size_high = 0x1f, win_size_low = 0xffffffff
/users/qianyich/RecoNIC/lib/rdma_api.c:539:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAi=0x60310, qpid=2, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:546:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAMSBi=0x603c8, qpid=2, value=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:550:allocate_rdma_qp(): DEBUG: qp->sq->dma_addr = 0xa350000000000000, sq_addr_msb = 0xa3500000, sq_addr_lsb = 0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:568:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAi=0x60318, qpid=2, value=0x14000
/users/qianyich/RecoNIC/lib/rdma_api.c:575:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAMSBi=0x603d0, qpid=2, value=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:579:allocate_rdma_qp(): DEBUG: qp->cq->dma_addr = 0xa350000000014000, cq_addr_msb = 0xa3500000, cq_addr_lsb = 0x14000
/users/qianyich/RecoNIC/lib/rdma_api.c:597:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAi=0x60308, qpid=2, value=0x16000
/users/qianyich/RecoNIC/lib/rdma_api.c:604:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAMSBi=0x603c0, qpid=2, value=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:608:allocate_rdma_qp(): DEBUG: qp->rq->dma_addr = 0xa350000000016000, rq_addr_msb = 0xa3500000, rq_addr_lsb = 0x16000
/users/qianyich/RecoNIC/lib/rdma_api.c:617:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDi=0x60328, qpid=2, value=0x54e00000
/users/qianyich/RecoNIC/lib/rdma_api.c:624:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDMSBi=0x6032c, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:625:allocate_rdma_qp(): DEBUG: cq_cidb_addr = 0x2f54e00000
/users/qianyich/RecoNIC/lib/rdma_api.c:634:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDi=0x60320, qpid=2, value=0x54e00020
/users/qianyich/RecoNIC/lib/rdma_api.c:641:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDMSBi=0x60324, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:642:allocate_rdma_qp(): DEBUG: rq_cidb_addr = 0x2f54e00020
/users/qianyich/RecoNIC/lib/rdma_api.c:651:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_DESTQPCONFi=0x60348, qpid=2, value=0x2
/users/qianyich/RecoNIC/lib/rdma_api.c:660:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QDEPTHi=0x6033c, qpid=2, value=0x400040
/users/qianyich/RecoNIC/lib/rdma_api.c:707:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPCONFi=0x60300, qpid=2, value=0x200043d
/users/qianyich/RecoNIC/lib/rdma_api.c:721:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPADVCONFi=0x60304, qpid=2, value=0x12344000
/users/qianyich/RecoNIC/lib/rdma_api.c:730:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_PDi=0x603b0, qpid=2, value=0x0
Info: allocate_rdma_qp - Successfully allocated a rdma qp
Info: CONFIGURE PSN
[Register] RN_RDMA_QCSR_LSTRQREQi=0x60344, qpid=2, value=0xa000abc
[Register] RN_RDMA_QCSR_SQPSNi=0x60340, qpid=2, value=0xabd
payload_size = 128, payload_size>>2 = 32
Info: Client is connecting to a remote server
Info: Client is connected to a remote server
Error: Can't receive remote offset of A from the remote peer

This time looks like the error is from the read application at line 319, rc = read(sockfd, &read_A_offset, sizeof(read_A_offset)); returns a value that is not over 0. And I am kind of confused with why socket is involved here? My understanding is that RDMA has nothing to do with socket.

qianyich commented 7 months ago

@zhguanw-amd the problem in network_systolic_mm has been resolved. I ran network_systolic_mm and read test back and forth. first network_systolic_mm, then read test, and again network_systolic_mm, and finally read test with no issues.

Right after that, I ran a write test with no issue, it passed. And the send_recv test also passed. At this point. I think they are all good, but I ran the read test again, I had a result mismatch error, and the DMA engine broke (found broken on both machines verified by the DMA test). There are no error messages reported by dmesg until I run the DMA test. All I can see is the last read test failed with result mismatch in the log. Therefore, the bug could be in either the write test or send_recv test.

I did a reboot after having this failure. The DMA went back to work (yes, I ran DMA test to verify it) and passed the network_systolic_mm test again, but failed to pass read test. Warning: CQHEADi and SQPIi for QP2 are mismatched, and I guess this warning is from either the write and send_recv test before reboot.

Warning: CQHEADi and SQPIi for QP2 are mismatched

***** QP2 FATAL RECOVERY *****
TIMEOUT: CQHEADi:0x0 and SQPIi:0x1 are different

dmesg before reboot and after running the write and send_recv test:

[  516.463259] IPv6: ADDRCONF(NETDEV_UP): enp59s0: link is not ready
[  516.463268] IPv6: ADDRCONF(NETDEV_CHANGE): enp59s0: link becomes ready
[  644.544157] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  650.451113] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  704.906296] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  740.446916] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  760.696267] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  846.452736] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  894.941990] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  915.548327] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[ 1034.975745] onic:qdma_request_wait_for_cmpl: qdma3b000-MM-67: req 0x00000000eececbd4, W,65536000,0/65536000,0x0, done 0, err 0, tm 10000.
[ 1034.988096] onic:qdma_descq_dump: qdma3b000-MM-67: 0x43/0x43, desc sz 1024/1022, pidx 641, cidx 640
[ 1034.988429] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[ 1037.802323] onic:error_intr_handler: Error IRQ fired on Funtion#0: index=7, vector=272
[ 1037.802330] eqdma_hw_error_process: Global Err Reg(0x248) = 0x4
[ 1037.802332] addr = 0x00000254 val = 0x00100000
[ 1037.802339] GLBL_DSC_ERR_STS                         0x254     0x100000   1048576
[ 1037.802341] GLBL_DSC_ERR_STS_RSVD_1                  [31,26]   0x0
[ 1037.802343] GLBL_DSC_ERR_STS_PORT_ID                 [   25]   0x0
[ 1037.802345] GLBL_DSC_ERR_STS_SBE                     [   24]   0x0
[ 1037.802346] GLBL_DSC_ERR_STS_DBE                     [   23]   0x0
[ 1037.802347] GLBL_DSC_ERR_STS_RQ_CANCEL               [   22]   0x0
[ 1037.802348] GLBL_DSC_ERR_STS_DSC                     [   21]   0x0
[ 1037.802350] GLBL_DSC_ERR_STS_DMA                     [   20]   0x1
[ 1037.802351] GLBL_DSC_ERR_STS_FLR_CANCEL              [   19]   0x0
[ 1037.802353] GLBL_DSC_ERR_STS_RSVD_2                  [18,17]   0x0
[ 1037.802354] GLBL_DSC_ERR_STS_DAT_POISON              [   16]   0x0
[ 1037.802355] GLBL_DSC_ERR_STS_TIMEOUT                 [    9]   0x0
[ 1037.802357] GLBL_DSC_ERR_STS_FLR                     [    8]   0x0
[ 1037.802358] GLBL_DSC_ERR_STS_TAG                     [    6]   0x0
[ 1037.802359] GLBL_DSC_ERR_STS_ADDR                    [    5]   0x0
[ 1037.802361] GLBL_DSC_ERR_STS_PARAM                   [    4]   0x0
[ 1037.802362] GLBL_DSC_ERR_STS_BCNT                    [    3]   0x0
[ 1037.802363] GLBL_DSC_ERR_STS_UR_CA                   [    2]   0x0
[ 1037.802364] GLBL_DSC_ERR_STS_POISON                  [    1]   0x0
[ 1037.802370] GLBL_DSC_ERR_LOG0                        0x25c     0xc0000043 -1073741757
[ 1037.802371] GLBL_DSC_ERR_LOG0_VALID                  [   31]   0x1
[ 1037.802373] GLBL_DSC_ERR_LOG0_SEL                    [   30]   0x1
[ 1037.802374] GLBL_DSC_ERR_LOG0_RSVD_1                 [29,13]   0x0
[ 1037.802376] GLBL_DSC_ERR_LOG0_QID                    [12, 0]   0x43
[ 1037.802381] GLBL_DSC_ERR_LOG1                        0x260     0x280014   2621460
[ 1037.802382] GLBL_DSC_ERR_LOG1_RSVD_1                 [31,28]   0x0
[ 1037.802384] GLBL_DSC_ERR_LOG1_CIDX                   [27,12]   0x280
[ 1037.802385] GLBL_DSC_ERR_LOG1_RSVD_2                 [11, 9]   0x0
[ 1037.802387] GLBL_DSC_ERR_LOG1_SUB_TYPE               [ 8, 5]   0x0
[ 1037.802389] GLBL_DSC_ERR_LOG1_ERR_TYPE               [ 4, 0]   0x14
[ 1037.802393] GLBL_DSC_DBG_DAT0                        0x270     0x0        0
[ 1037.802395] GLBL_DSC_DAT0_RSVD_1                     [31,30]   0x0
[ 1037.802396] GLBL_DSC_DAT0_CTXT_ARB_DIR               [   29]   0x0
[ 1037.802398] GLBL_DSC_DAT0_CTXT_ARB_QID               [28,17]   0x0
[ 1037.802399] GLBL_DSC_DAT0_CTXT_ARB_REQ               [16,12]   0x0
[ 1037.802400] GLBL_DSC_DAT0_IRQ_FIFO_FL                [   11]   0x0
[ 1037.802402] GLBL_DSC_DAT0_TMSTALL                    [   10]   0x0
[ 1037.802403] GLBL_DSC_DAT0_RRQ_STALL                  [ 9, 8]   0x0
[ 1037.802405] GLBL_DSC_DAT0_RCP_FIFO_SPC_STALL         [ 7, 6]   0x0
[ 1037.802406] GLBL_DSC_DAT0_RRQ_FIFO_SPC_STALL         [ 5, 4]   0x0
[ 1037.802408] GLBL_DSC_DAT0_FAB_MRKR_RSP_STALL         [ 3, 2]   0x0
[ 1037.802409] GLBL_DSC_DAT0_DSC_OUT_STALL              [ 1, 0]   0x0
[ 1037.802413] GLBL_DSC_DBG_DAT1                        0x274     0x0        0
[ 1037.802415] GLBL_DSC_DAT1_RSVD_1                     [31,28]   0x0
[ 1037.802416] GLBL_DSC_DAT1_EVT_SPC_C2H                [27,22]   0x0
[ 1037.802418] GLBL_DSC_DAT1_EVT_SP_H2C                 [21,16]   0x0
[ 1037.802420] GLBL_DSC_DAT1_DSC_SPC_C2H                [15, 8]   0x0
[ 1037.802421] GLBL_DSC_DAT1_DSC_SPC_H2C                [ 7, 0]   0x0
[ 1037.802426] GLBL_DSC_ERR_LOG2                        0x27c     0x2800280  41943680
[ 1037.802428] GLBL_DSC_ERR_LOG2_OLD_PIDX               [31,16]   0x280
[ 1037.802429] GLBL_DSC_ERR_LOG2_NEW_PIDX               [15, 0]   0x280
[ 1037.802431] eqdma_hw_error_process detected DMA engine error
[ 1037.815936] onic:error_intr_handler: Error IRQ fired on Funtion#0: index=7, vector=272
[ 1037.815942] eqdma_hw_error_process: Global Err Reg(0x248) = 0x4
[ 1037.815944] addr = 0x00000254 val = 0x00100000
[ 1037.815951] GLBL_DSC_ERR_STS                         0x254     0x100000   1048576
[ 1037.815953] GLBL_DSC_ERR_STS_RSVD_1                  [31,26]   0x0
[ 1037.815955] GLBL_DSC_ERR_STS_PORT_ID                 [   25]   0x0
[ 1037.815957] GLBL_DSC_ERR_STS_SBE                     [   24]   0x0
[ 1037.815958] GLBL_DSC_ERR_STS_DBE                     [   23]   0x0
[ 1037.815959] GLBL_DSC_ERR_STS_RQ_CANCEL               [   22]   0x0
[ 1037.815961] GLBL_DSC_ERR_STS_DSC                     [   21]   0x0
[ 1037.815962] GLBL_DSC_ERR_STS_DMA                     [   20]   0x1
[ 1037.815963] GLBL_DSC_ERR_STS_FLR_CANCEL              [   19]   0x0
[ 1037.815965] GLBL_DSC_ERR_STS_RSVD_2                  [18,17]   0x0
[ 1037.815966] GLBL_DSC_ERR_STS_DAT_POISON              [   16]   0x0
[ 1037.815968] GLBL_DSC_ERR_STS_TIMEOUT                 [    9]   0x0
[ 1037.815969] GLBL_DSC_ERR_STS_FLR                     [    8]   0x0
[ 1037.815970] GLBL_DSC_ERR_STS_TAG                     [    6]   0x0
[ 1037.815971] GLBL_DSC_ERR_STS_ADDR                    [    5]   0x0
[ 1037.815973] GLBL_DSC_ERR_STS_PARAM                   [    4]   0x0
[ 1037.815974] GLBL_DSC_ERR_STS_BCNT                    [    3]   0x0
[ 1037.815975] GLBL_DSC_ERR_STS_UR_CA                   [    2]   0x0
[ 1037.815976] GLBL_DSC_ERR_STS_POISON                  [    1]   0x0
[ 1037.815982] GLBL_DSC_ERR_LOG0                        0x25c     0xc0000043 -1073741757
[ 1037.815983] GLBL_DSC_ERR_LOG0_VALID                  [   31]   0x1
[ 1037.815985] GLBL_DSC_ERR_LOG0_SEL                    [   30]   0x1
[ 1037.815986] GLBL_DSC_ERR_LOG0_RSVD_1                 [29,13]   0x0
[ 1037.815988] GLBL_DSC_ERR_LOG0_QID                    [12, 0]   0x43
[ 1037.815993] GLBL_DSC_ERR_LOG1                        0x260     0x280014   2621460
[ 1037.815994] GLBL_DSC_ERR_LOG1_RSVD_1                 [31,28]   0x0
[ 1037.815996] GLBL_DSC_ERR_LOG1_CIDX                   [27,12]   0x280
[ 1037.815997] GLBL_DSC_ERR_LOG1_RSVD_2                 [11, 9]   0x0
[ 1037.815999] GLBL_DSC_ERR_LOG1_SUB_TYPE               [ 8, 5]   0x0
[ 1037.816000] GLBL_DSC_ERR_LOG1_ERR_TYPE               [ 4, 0]   0x14
[ 1037.816005] GLBL_DSC_DBG_DAT0                        0x270     0x0        0
[ 1037.816006] GLBL_DSC_DAT0_RSVD_1                     [31,30]   0x0
[ 1037.816008] GLBL_DSC_DAT0_CTXT_ARB_DIR               [   29]   0x0
[ 1037.816009] GLBL_DSC_DAT0_CTXT_ARB_QID               [28,17]   0x0
[ 1037.816022] GLBL_DSC_DAT0_CTXT_ARB_REQ               [16,12]   0x0
[ 1037.816022] GLBL_DSC_DAT0_IRQ_FIFO_FL                [   11]   0x0
[ 1037.816023] GLBL_DSC_DAT0_TMSTALL                    [   10]   0x0
[ 1037.816023] GLBL_DSC_DAT0_RRQ_STALL                  [ 9, 8]   0x0
[ 1037.816024] GLBL_DSC_DAT0_RCP_FIFO_SPC_STALL         [ 7, 6]   0x0
[ 1037.816025] GLBL_DSC_DAT0_RRQ_FIFO_SPC_STALL         [ 5, 4]   0x0
[ 1037.816025] GLBL_DSC_DAT0_FAB_MRKR_RSP_STALL         [ 3, 2]   0x0
[ 1037.816026] GLBL_DSC_DAT0_DSC_OUT_STALL              [ 1, 0]   0x0
[ 1037.816029] GLBL_DSC_DBG_DAT1                        0x274     0x0        0
[ 1037.816030] GLBL_DSC_DAT1_RSVD_1                     [31,28]   0x0
[ 1037.816030] GLBL_DSC_DAT1_EVT_SPC_C2H                [27,22]   0x0
[ 1037.816031] GLBL_DSC_DAT1_EVT_SP_H2C                 [21,16]   0x0
[ 1037.816032] GLBL_DSC_DAT1_DSC_SPC_C2H                [15, 8]   0x0
[ 1037.816032] GLBL_DSC_DAT1_DSC_SPC_H2C                [ 7, 0]   0x0
[ 1037.816036] GLBL_DSC_ERR_LOG2                        0x27c     0x2800280  41943680
[ 1037.816036] GLBL_DSC_ERR_LOG2_OLD_PIDX               [31,16]   0x280
[ 1037.816037] GLBL_DSC_ERR_LOG2_NEW_PIDX               [15, 0]   0x280
[ 1037.816037] eqdma_hw_error_process detected DMA engine error
[ 1037.829666] onic:error_intr_handler: Error IRQ fired on Funtion#0: index=7, vector=272
[ 1037.829673] eqdma_hw_error_process: Global Err Reg(0x248) = 0x4
[ 1037.829676] addr = 0x00000254 val = 0x00100000
[ 1037.829682] GLBL_DSC_ERR_STS                         0x254     0x100000   1048576
[ 1037.829684] GLBL_DSC_ERR_STS_RSVD_1                  [31,26]   0x0
[ 1037.829686] GLBL_DSC_ERR_STS_PORT_ID                 [   25]   0x0
[ 1037.829688] GLBL_DSC_ERR_STS_SBE                     [   24]   0x0
[ 1037.829689] GLBL_DSC_ERR_STS_DBE                     [   23]   0x0
[ 1037.829690] GLBL_DSC_ERR_STS_RQ_CANCEL               [   22]   0x0
[ 1037.829692] GLBL_DSC_ERR_STS_DSC                     [   21]   0x0
[ 1037.829693] GLBL_DSC_ERR_STS_DMA                     [   20]   0x1
[ 1037.829694] GLBL_DSC_ERR_STS_FLR_CANCEL              [   19]   0x0
[ 1037.829696] GLBL_DSC_ERR_STS_RSVD_2                  [18,17]   0x0
[ 1037.829697] GLBL_DSC_ERR_STS_DAT_POISON              [   16]   0x0
[ 1037.829699] GLBL_DSC_ERR_STS_TIMEOUT                 [    9]   0x0
[ 1037.829700] GLBL_DSC_ERR_STS_FLR                     [    8]   0x0
[ 1037.829701] GLBL_DSC_ERR_STS_TAG                     [    6]   0x0
[ 1037.829703] GLBL_DSC_ERR_STS_ADDR                    [    5]   0x0
[ 1037.829704] GLBL_DSC_ERR_STS_PARAM                   [    4]   0x0
[ 1037.829705] GLBL_DSC_ERR_STS_BCNT                    [    3]   0x0
[ 1037.829706] GLBL_DSC_ERR_STS_UR_CA                   [    2]   0x0
[ 1037.829708] GLBL_DSC_ERR_STS_POISON                  [    1]   0x0
[ 1037.829713] GLBL_DSC_ERR_LOG0                        0x25c     0xc0000043 -1073741757
[ 1037.829714] GLBL_DSC_ERR_LOG0_VALID                  [   31]   0x1
[ 1037.829716] GLBL_DSC_ERR_LOG0_SEL                    [   30]   0x1
[ 1037.829717] GLBL_DSC_ERR_LOG0_RSVD_1                 [29,13]   0x0
[ 1037.829719] GLBL_DSC_ERR_LOG0_QID                    [12, 0]   0x43
[ 1037.829724] GLBL_DSC_ERR_LOG1                        0x260     0x280014   2621460
[ 1037.829725] GLBL_DSC_ERR_LOG1_RSVD_1                 [31,28]   0x0
[ 1037.829727] GLBL_DSC_ERR_LOG1_CIDX                   [27,12]   0x280
[ 1037.829728] GLBL_DSC_ERR_LOG1_RSVD_2                 [11, 9]   0x0
[ 1037.829730] GLBL_DSC_ERR_LOG1_SUB_TYPE               [ 8, 5]   0x0
[ 1037.829731] GLBL_DSC_ERR_LOG1_ERR_TYPE               [ 4, 0]   0x14
[ 1037.829736] GLBL_DSC_DBG_DAT0                        0x270     0x0        0
[ 1037.829737] GLBL_DSC_DAT0_RSVD_1                     [31,30]   0x0
[ 1037.829739] GLBL_DSC_DAT0_CTXT_ARB_DIR               [   29]   0x0
[ 1037.829740] GLBL_DSC_DAT0_CTXT_ARB_QID               [28,17]   0x0
[ 1037.829742] GLBL_DSC_DAT0_CTXT_ARB_REQ               [16,12]   0x0
[ 1037.829743] GLBL_DSC_DAT0_IRQ_FIFO_FL                [   11]   0x0
[ 1037.829744] GLBL_DSC_DAT0_TMSTALL                    [   10]   0x0
[ 1037.829746] GLBL_DSC_DAT0_RRQ_STALL                  [ 9, 8]   0x0
[ 1037.829747] GLBL_DSC_DAT0_RCP_FIFO_SPC_STALL         [ 7, 6]   0x0
[ 1037.829749] GLBL_DSC_DAT0_RRQ_FIFO_SPC_STALL         [ 5, 4]   0x0
[ 1037.829750] GLBL_DSC_DAT0_FAB_MRKR_RSP_STALL         [ 3, 2]   0x0
[ 1037.829752] GLBL_DSC_DAT0_DSC_OUT_STALL              [ 1, 0]   0x0
[ 1037.829767] GLBL_DSC_DBG_DAT1                        0x274     0x0        0
[ 1037.829768] GLBL_DSC_DAT1_RSVD_1                     [31,28]   0x0
[ 1037.829768] GLBL_DSC_DAT1_EVT_SPC_C2H                [27,22]   0x0
[ 1037.829769] GLBL_DSC_DAT1_EVT_SP_H2C                 [21,16]   0x0
[ 1037.829769] GLBL_DSC_DAT1_DSC_SPC_C2H                [15, 8]   0x0
[ 1037.829770] GLBL_DSC_DAT1_DSC_SPC_H2C                [ 7, 0]   0x0
[ 1037.829773] GLBL_DSC_ERR_LOG2                        0x27c     0x2800280  41943680
[ 1037.829774] GLBL_DSC_ERR_LOG2_OLD_PIDX               [31,16]   0x280
[ 1037.829774] GLBL_DSC_ERR_LOG2_NEW_PIDX               [15, 0]   0x280
[ 1037.829775] eqdma_hw_error_process detected DMA engine error
[ 1037.843411] onic:error_intr_handler: Error IRQ fired on Funtion#0: index=7, vector=272
[ 1037.843418] eqdma_hw_error_process: Global Err Reg(0x248) = 0x4
[ 1037.843420] addr = 0x00000254 val = 0x00100000
[ 1037.843426] GLBL_DSC_ERR_STS                         0x254     0x100000   1048576
[ 1037.843429] GLBL_DSC_ERR_STS_RSVD_1                  [31,26]   0x0
[ 1037.843431] GLBL_DSC_ERR_STS_PORT_ID                 [   25]   0x0
[ 1037.843432] GLBL_DSC_ERR_STS_SBE                     [   24]   0x0
[ 1037.843433] GLBL_DSC_ERR_STS_DBE                     [   23]   0x0
[ 1037.843435] GLBL_DSC_ERR_STS_RQ_CANCEL               [   22]   0x0
[ 1037.843436] GLBL_DSC_ERR_STS_DSC                     [   21]   0x0
[ 1037.843437] GLBL_DSC_ERR_STS_DMA                     [   20]   0x1
[ 1037.843439] GLBL_DSC_ERR_STS_FLR_CANCEL              [   19]   0x0
[ 1037.843440] GLBL_DSC_ERR_STS_RSVD_2                  [18,17]   0x0
[ 1037.843442] GLBL_DSC_ERR_STS_DAT_POISON              [   16]   0x0
[ 1037.843443] GLBL_DSC_ERR_STS_TIMEOUT                 [    9]   0x0
[ 1037.843444] GLBL_DSC_ERR_STS_FLR                     [    8]   0x0
[ 1037.843446] GLBL_DSC_ERR_STS_TAG                     [    6]   0x0
[ 1037.843447] GLBL_DSC_ERR_STS_ADDR                    [    5]   0x0
[ 1037.843449] GLBL_DSC_ERR_STS_PARAM                   [    4]   0x0
[ 1037.843450] GLBL_DSC_ERR_STS_BCNT                    [    3]   0x0
[ 1037.843451] GLBL_DSC_ERR_STS_UR_CA                   [    2]   0x0
[ 1037.843452] GLBL_DSC_ERR_STS_POISON                  [    1]   0x0
[ 1037.843458] GLBL_DSC_ERR_LOG0                        0x25c     0xc0000043 -1073741757
[ 1037.843459] GLBL_DSC_ERR_LOG0_VALID                  [   31]   0x1
[ 1037.843461] GLBL_DSC_ERR_LOG0_SEL                    [   30]   0x1
[ 1037.843462] GLBL_DSC_ERR_LOG0_RSVD_1                 [29,13]   0x0
[ 1037.843464] GLBL_DSC_ERR_LOG0_QID                    [12, 0]   0x43
[ 1037.843469] GLBL_DSC_ERR_LOG1                        0x260     0x280014   2621460
[ 1037.843470] GLBL_DSC_ERR_LOG1_RSVD_1                 [31,28]   0x0
[ 1037.843472] GLBL_DSC_ERR_LOG1_CIDX                   [27,12]   0x280
[ 1037.843473] GLBL_DSC_ERR_LOG1_RSVD_2                 [11, 9]   0x0
[ 1037.843475] GLBL_DSC_ERR_LOG1_SUB_TYPE               [ 8, 5]   0x0
[ 1037.843476] GLBL_DSC_ERR_LOG1_ERR_TYPE               [ 4, 0]   0x14
[ 1037.843481] GLBL_DSC_DBG_DAT0                        0x270     0x0        0
[ 1037.843483] GLBL_DSC_DAT0_RSVD_1                     [31,30]   0x0
[ 1037.843484] GLBL_DSC_DAT0_CTXT_ARB_DIR               [   29]   0x0
[ 1037.843485] GLBL_DSC_DAT0_CTXT_ARB_QID               [28,17]   0x0
[ 1037.843487] GLBL_DSC_DAT0_CTXT_ARB_REQ               [16,12]   0x0
[ 1037.843488] GLBL_DSC_DAT0_IRQ_FIFO_FL                [   11]   0x0
[ 1037.843490] GLBL_DSC_DAT0_TMSTALL                    [   10]   0x0
[ 1037.843491] GLBL_DSC_DAT0_RRQ_STALL                  [ 9, 8]   0x0
[ 1037.843493] GLBL_DSC_DAT0_RCP_FIFO_SPC_STALL         [ 7, 6]   0x0
[ 1037.843494] GLBL_DSC_DAT0_RRQ_FIFO_SPC_STALL         [ 5, 4]   0x0
[ 1037.843496] GLBL_DSC_DAT0_FAB_MRKR_RSP_STALL         [ 3, 2]   0x0
[ 1037.843497] GLBL_DSC_DAT0_DSC_OUT_STALL              [ 1, 0]   0x0
[ 1037.843501] GLBL_DSC_DBG_DAT1                        0x274     0x0        0
[ 1037.843503] GLBL_DSC_DAT1_RSVD_1                     [31,28]   0x0
[ 1037.843504] GLBL_DSC_DAT1_EVT_SPC_C2H                [27,22]   0x0
[ 1037.843506] GLBL_DSC_DAT1_EVT_SP_H2C                 [21,16]   0x0
[ 1037.843507] GLBL_DSC_DAT1_DSC_SPC_C2H                [15, 8]   0x0
[ 1037.843509] GLBL_DSC_DAT1_DSC_SPC_H2C                [ 7, 0]   0x0
[ 1037.843513] GLBL_DSC_ERR_LOG2                        0x27c     0x2800280  41943680
[ 1037.843515] GLBL_DSC_ERR_LOG2_OLD_PIDX               [31,16]   0x280
[ 1037.843516] GLBL_DSC_ERR_LOG2_NEW_PIDX               [15, 0]   0x280
[ 1037.843518] eqdma_hw_error_process detected DMA engine error
[ 1048.031802] onic:qdma_request_wait_for_cmpl: qdma3b000-MM-67: req 0x00000000b58b1766, R,4190208,0/65536000,0x0, done 0, err 0, tm 10000.
[ 1048.044054] onic:qdma_descq_dump: qdma3b000-MM-67: 0x43/0x43, desc sz 1024/0, pidx 639, cidx 640
[ 1048.044464] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.

read log on 191.100.51.1 before reboot. This does not have QP2 Fatal Recovery. I tried a few more times, and the log started to show QP2's CQ and SQ mismatch and fatal recovery, and then I reboot the machines:

sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./read -r 192.100.51.1 -i 192.100.52.1 -p /sys/bus/pci/devices/0000\:3b\:00.0/resource2 -z 128 -l host_mem -d /dev/reconic-mm -c -u 22222 -t 11111 --dst_qp 2 -g 2>&1 | tee client_debug.log
src_ip_str = 192.100.51.1
dst_ip_str = 192.100.52.1
Info: mac_addr_t = 00:0a:35:f7:81:e1
Info: PCIe resource file: /sys/bus/pci/devices/0000:3b:00.0/resource2
Info: QP allocated at: host_mem
Info: Device - /dev/reconic-mm
Info: src_ip = 192.100.51.1
Info: Found network interface: enp59s0
Info: mac_addr_t = 00:0a:35:54:60:02
Info: Creating rn_dev
/users/qianyich/RecoNIC/lib/reconic.c:301:create_rn_dev(): Info: scr(=4)) file open successfully
create_rn_dev - testing2
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f6b400
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f6b400000
Info: pre-allocated hugepage buffer vir addr = 0x7f93bf000000, physical addr = 0x2f6b400000
Info: Configuring 8 windows in QDMA AXI bridge BDF, each has 128GB mapping
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16420, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16424, bdf_addr_high=0x0
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16430, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16440, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16444, bdf_addr_high=0x20
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16450, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16460, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16464, bdf_addr_high=0x40
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16470, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16480, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16484, bdf_addr_high=0x60
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16490, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x164a0, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x164a4, bdf_addr_high=0x80
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x164b0, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x164c0, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x164c4, bdf_addr_high=0xa0
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x164d0, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x164e0, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x164e4, bdf_addr_high=0xc0
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x164f0, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16500, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16504, bdf_addr_high=0xe0
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16510, bdf_win_config=0xc2000000
Info: CREATE RDMA DEVICE
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f6b400
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f6b400000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f93bf000000, physical addr = 2f6b400000, rn_dev->buffer_offset = 0x200000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f6b600
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f6b600000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f93bf200000, physical addr = 2f6b600000, rn_dev->buffer_offset = 0x1200000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f6a600
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f6a600000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f93c0200000, physical addr = 2f6a600000, rn_dev->buffer_offset = 0x1202000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f6a602
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f6a602000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f93c0202000, physical addr = 2f6a602000, rn_dev->buffer_offset = 0x1212000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f6a612
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f6a612000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f93c0212000, physical addr = 2f6a612000, rn_dev->buffer_offset = 0x1222000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: OPEN RDMA DEVICE
/users/qianyich/RecoNIC/lib/rdma_api.c:186:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBA=0x600a0, value=0x6b600000
/users/qianyich/RecoNIC/lib/rdma_api.c:188:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBAMSB=0x600a4, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:190:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFSZ=0x600a8, value=0x10001000
/users/qianyich/RecoNIC/lib/rdma_api.c:193:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBA=0x60088, value=0x6a600000
/users/qianyich/RecoNIC/lib/rdma_api.c:195:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBAMSB=0x6008c, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:197:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60090, value=0x2000
/users/qianyich/RecoNIC/lib/rdma_api.c:200:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBA=0x60060, value=0x6a602000
/users/qianyich/RecoNIC/lib/rdma_api.c:202:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBAMSB=0x60064, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:204:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60068, value=0x1000100
/users/qianyich/RecoNIC/lib/rdma_api.c:207:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBA=0x600b0, value=0x6a612000
/users/qianyich/RecoNIC/lib/rdma_api.c:209:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBAMSB=0x600b4, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:211:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZ=0x600b8, value=0x10000
/users/qianyich/RecoNIC/lib/rdma_api.c:213:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZMSB=0x600bc, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:217:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_INTEN=0x60180, value=0xff
/users/qianyich/RecoNIC/lib/rdma_api.c:221:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDLSB=0x60010, value=0x35546002
/users/qianyich/RecoNIC/lib/rdma_api.c:223:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDMSB=0x60014, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:227:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPV4XADD=0x60070, value=0xc0643301
/users/qianyich/RecoNIC/lib/rdma_api.c:230:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICCONF=0x60000, value=0x56ce0821
/users/qianyich/RecoNIC/lib/rdma_api.c:233:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICADCONF=0x60004, value=0xa0004
Info: RDMA global control status registers are configured.
Info: rdma_dev opened
Info: ALLOCATE PD
/users/qianyich/RecoNIC/lib/rdma_api.c:254:allocate_rdma_pd(): [Register] RN_RDMA_PDT_PDPDNUM=0x40000, pd_num=0, value=0x0
Info: OPEN DEVICE FILE
Info: ALLOCATE RDMA QP
Allocating qp->sq
/users/qianyich/RecoNIC/lib/rdma_api.c:437:allocate_rdma_qp(): sq_size = 32768, cq_size = 2048, rq_size 262144, buf_location = host_mem
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f6a622
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f6a622000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f93c0222000, physical addr = 2f6a622000, rn_dev->buffer_offset = 0x122a000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Allocating qp->cq
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f6a62a
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f6a62a000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f93c022a000, physical addr = 2f6a62a000, rn_dev->buffer_offset = 0x122a800
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Allocating qp->rq
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f6a62b
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f6a62b000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f93c022b000, physical addr = 2f6a62b000, rn_dev->buffer_offset = 0x126b000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: queue pair setting is done! Configuring RDMA per-queu CSR registers
/users/qianyich/RecoNIC/lib/rdma_api.c:487:allocate_rdma_qp(): DEBUG: rdma_dev->rn_dev->axil_ctl = 0x7f93df1fa000, rdma_dev->axil_ctl = 0x7f93df1fa000
/users/qianyich/RecoNIC/lib/rdma_api.c:502:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_IPDESADDR1i=0x60360, qpid=2, value=0xc0643401
/users/qianyich/RecoNIC/lib/rdma_api.c:509:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDLSBi=0x60350, qpid=2, value=0x35f781e1
/users/qianyich/RecoNIC/lib/rdma_api.c:516:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDMSBi=0x60354, qpid=2, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:521:allocate_rdma_qp(): DEBUG: win_size_high = 0xff, win_size_low = 0xffffffff
/users/qianyich/RecoNIC/lib/rdma_api.c:539:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAi=0x60310, qpid=2, value=0x6a622000
/users/qianyich/RecoNIC/lib/rdma_api.c:546:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAMSBi=0x603c8, qpid=2, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:550:allocate_rdma_qp(): DEBUG: qp->sq->dma_addr = 0x2f6a622000, sq_addr_msb = 0x2f, sq_addr_lsb = 0x6a622000
/users/qianyich/RecoNIC/lib/rdma_api.c:568:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAi=0x60318, qpid=2, value=0x6a62a000
/users/qianyich/RecoNIC/lib/rdma_api.c:575:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAMSBi=0x603d0, qpid=2, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:579:allocate_rdma_qp(): DEBUG: qp->cq->dma_addr = 0x2f6a62a000, cq_addr_msb = 0x2f, cq_addr_lsb = 0x6a62a000
/users/qianyich/RecoNIC/lib/rdma_api.c:597:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAi=0x60308, qpid=2, value=0x6a62b000
/users/qianyich/RecoNIC/lib/rdma_api.c:604:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAMSBi=0x603c0, qpid=2, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:608:allocate_rdma_qp(): DEBUG: qp->rq->dma_addr = 0x2f6a62b000, rq_addr_msb = 0x2f, rq_addr_lsb = 0x6a62b000
/users/qianyich/RecoNIC/lib/rdma_api.c:617:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDi=0x60328, qpid=2, value=0x6b400000
/users/qianyich/RecoNIC/lib/rdma_api.c:624:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDMSBi=0x6032c, qpid=2, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:625:allocate_rdma_qp(): DEBUG: cq_cidb_addr = 0x2f6b400000
/users/qianyich/RecoNIC/lib/rdma_api.c:634:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDi=0x60320, qpid=2, value=0x6b400020
/users/qianyich/RecoNIC/lib/rdma_api.c:641:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDMSBi=0x60324, qpid=2, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:642:allocate_rdma_qp(): DEBUG: rq_cidb_addr = 0x2f6b400020
/users/qianyich/RecoNIC/lib/rdma_api.c:651:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_DESTQPCONFi=0x60348, qpid=2, value=0x2
/users/qianyich/RecoNIC/lib/rdma_api.c:660:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QDEPTHi=0x6033c, qpid=2, value=0x400040
/users/qianyich/RecoNIC/lib/rdma_api.c:707:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPCONFi=0x60300, qpid=2, value=0x200043d
/users/qianyich/RecoNIC/lib/rdma_api.c:721:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPADVCONFi=0x60304, qpid=2, value=0x12344000
/users/qianyich/RecoNIC/lib/rdma_api.c:730:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_PDi=0x603b0, qpid=2, value=0x0
Info: allocate_rdma_qp - Successfully allocated a rdma qp
Info: CONFIGURE PSN
[Register] RN_RDMA_QCSR_LSTRQREQi=0x60344, qpid=2, value=0xa000abc
[Register] RN_RDMA_QCSR_SQPSNi=0x60340, qpid=2, value=0xabd
payload_size = 128, payload_size>>2 = 32
Info: Client is connecting to a remote server
Info: Client is connected to a remote server
Info: client received remote offset of A = 0xa350000000000000
/users/qianyich/RecoNIC/lib/reconic.c:256:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000000000, rn_dev->dev_buffer_offset = 0x80
/users/qianyich/RecoNIC/lib/reconic.c:258:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Info: creating an RDMA read WQE for getting data
/users/qianyich/RecoNIC/lib/rdma_api.c:769:create_a_wqe(): Info: WQE mem_buffer = 0xa350000000000000, masked_mem_buffer = 0xa350000000000000
/users/qianyich/RecoNIC/lib/rdma_api.c:796:create_a_wqe(): [WQE] wrid=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:797:create_a_wqe(): [WQE] laddr_low=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:798:create_a_wqe(): [WQE] laddr_high=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:799:create_a_wqe(): [WQE] length=0x80
/users/qianyich/RecoNIC/lib/rdma_api.c:800:create_a_wqe(): [WQE] opcode=0x4
/users/qianyich/RecoNIC/lib/rdma_api.c:801:create_a_wqe(): [WQE] remote_offset_low=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:802:create_a_wqe(): [WQE] remote_offset_high=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:803:create_a_wqe(): [WQE] r_key=0x8
/users/qianyich/RecoNIC/lib/rdma_api.c:804:create_a_wqe(): [WQE] send_small_payload0=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:805:create_a_wqe(): [WQE] send_small_payload1=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:806:create_a_wqe(): [WQE] send_small_payload2=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:807:create_a_wqe(): [WQE] send_small_payload3=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:808:create_a_wqe(): [WQE] immdt_data=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:875:rdma_post_send(): DEBUG: Reading hardware SQPIi (0x60338) = 0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:876:rdma_post_send(): DEBUG: original qp->sq_pidb = 0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:882:rdma_post_send(): [Register] RN_RDMA_QCSR_SQPIi=0x60338, qpid=2, value=0x1
/users/qianyich/RecoNIC/lib/rdma_api.c:883:rdma_post_send(): DEBUG: Update hardware sq db idx from software = 1
/users/qianyich/RecoNIC/lib/rdma_api.c:884:rdma_post_send(): DEBUG: Reading hardware SQPIi (0x60338) = 0x1
/users/qianyich/RecoNIC/lib/rdma_api.c:844:poll_cq_cidb(): [Register] RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:846:poll_cq_cidb(): DEBUG: before polling: sq_cidb = 0; Polling CQ CIDB = 0
/users/qianyich/RecoNIC/lib/rdma_api.c:857:poll_cq_cidb(): DEBUG: after polling: sq_cidb = 0; Polling CQ CIDB = 1
Successfully sent an RDMA read operation
Info: Dump register values for debug purpose
Info: [RN_RDMA_GCSR_ERRBUFWPTR      = 0x6006c] = 0x0
Info: [RN_RDMA_GCSR_IPKTERRQWPTR    = 0x60094] = 0x0
Info: [RN_RDMA_GCSR_INSRRPKTCNT     = 0x60100] = 0x20004
Info: [RN_RDMA_GCSR_INAMPKTCNT      = 0x60104] = 0x1
Info: [RN_RDMA_GCSR_OUTIOPKTCNT     = 0x60108] = 0x40000
Info: [RN_RDMA_GCSR_OUTAMPKTCNT     = 0x6010c] = 0x1
Info: [RN_RDMA_GCSR_LSTINPKT        = 0x60110] = 0xabd0210
Info: [RN_RDMA_GCSR_LSTOUTPKT       = 0x60114] = 0x157a204
Info: [RN_RDMA_GCSR_ININVDUPCNT     = 0x60118] = 0x0
Info: [RN_RDMA_GCSR_INNCKPKTSTS     = 0x6011c] = 0x0
Info: [RN_RDMA_GCSR_OUTRNRPKTSTS    = 0x60120] = 0x0
Info: [RN_RDMA_GCSR_WQEPROCSTS      = 0x60124] = 0x12122000
Info: [RN_RDMA_GCSR_QPMSTS          = 0x6012c] = 0x40002
Info: [RN_RDMA_GCSR_INALLDRPPKTCNT  = 0x60130] = 0xe0000
Info: [RN_RDMA_GCSR_INNAKPKTCNT     = 0x60134] = 0x0
Info: [RN_RDMA_GCSR_OUTNAKPKTCNT    = 0x60138] = 0x0
Info: [RN_RDMA_GCSR_RESPHNDSTS      = 0x6013c] = 0x10f02
Info: [RN_RDMA_GCSR_RETRYCNTSTS     = 0x60140] = 0x0
Info: [RN_RDMA_GCSR_INCNPPKTCNT     = 0x60174] = 0x0
Info: [RN_RDMA_GCSR_OUTCNPPKTCNT    = 0x60178] = 0x0
Info: [RN_RDMA_GCSR_OUTRDRSPPKTCNT  = 0x6017c] = 0x6
Info: [RN_RDMA_GCSR_INTSTS          = 0x60184] = 0x10
Info: [RN_RDMA_GCSR_RQINTSTS1       = 0x60190] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS2       = 0x60194] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS3       = 0x60198] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS4       = 0x6019c] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS5       = 0x601a0] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS6       = 0x601a4] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS7       = 0x601a8] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS8       = 0x601ac] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS1       = 0x601b0] = 0x4
Info: [RN_RDMA_GCSR_CQINTSTS2       = 0x601b4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS3       = 0x601b8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS4       = 0x601bc] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS5       = 0x601c0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS6       = 0x601c4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS7       = 0x601c8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS8       = 0x601cc] = 0x0
Info: [RN_RDMA_QCSR_CQHEADi         = 0x60330] = 0x1
Info: [RN_RDMA_QCSR_STATSSNi        = 0x60380] = 0x4
Info: [RN_RDMA_QCSR_STATMSNi        = 0x60384] = 0x0
Info: [RN_RDMA_QCSR_STATQPi         = 0x60388] = 0x1f0600
Info: [RN_RDMA_QCSR_STATCURSQPTRi   = 0x6038c] = 0x1
Info: [RN_RDMA_QCSR_STATRESPSNi     = 0x60390] = 0xabd
Info: [RN_RDMA_QCSR_STATRQBUFCAi    = 0x60394] = 0x6a62b000
Info: [RN_RDMA_QCSR_STATWQEi        = 0x60398] = 0x0
Info: [RN_RDMA_QCSR_STATRQPIDBi     = 0x6039c] = 0x0
Info: [RN_RDMA_QCSR_STATRQBUFCAMSBi = 0x603d8] = 0x2f
Info: [RN_RDMA_QCSR_SQPIi           = 0x60338] = 0x1

Info: All data has been received!
Info: buffer physical address is 0xa350000000000000
Info: Time spent 8.531000 usec, size = 128 bytes, Bandwidth = 0.120033 gigabits/sec
Info: The value of rc is 128
Info: CHECK RECEIVED DATA
Error: received data mismatched: recv[0]=541065216, sw_golden[0]=0
/users/qianyich/RecoNIC/lib/rdma_api.c:1088:destroy_rdma_qp(): [DEBUG] Destroying dev: 0x7f93df1fa000, RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:1093:destroy_rdma_qp(): [DEBUG] Destroying dev: 0x7f93df1fa000, RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:1101:destroy_rdma_qp(): [DEBUG] Destroying dev: 0x7f93df1fa000, RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0
qianyich commented 7 months ago

@zhguanw-amd I think, at this point, it is just some bugs in the test programs. RecoNIC is a bit fragile, and any small mistakes in the code could trap the device in an erroneous state, and the pain is that it can never be recovered unless we reprogram the board. BTW, I think the hardware design is ok on U280. We can merge that into the repo if you want these days.

qianyich commented 7 months ago

Commenting out the if block in onic_main.c did work. Just let you know if you want to change it in the main branch.

zhguanw-amd commented 7 months ago

@qianyich QP fatal recovery is a bug in the rdma IP, which is fixed in 4.0 version. I'll push this newer version when I have time.

The above QDMA issue is related to QDMA MM and ST channels mapping. You can revert back to the onic-driver in commit "9e4f0b74bc69744d6d807115e9a23705ba967dbb". But in this commit, there will be around 2-3% ping packet loss due to pid assigned to netdev exceeding 64 occasionally. The latter pushes are used to fix this isse, but seems introducing other qdma problems.

qianyich commented 7 months ago

@zhguanw-amd Should I just wait for the new release?

I found that QPN, PSN, and rkey are hard coded in the tests. Does RecoNIC provide APIs that generate QPN, rkey, PSN, etc? I looked into the APIs, and probably the answer is no. RNICs from other vendors usually have their own algorithms to generate QPN, etc. Otherwise, it could be insecure, although it is still insecure with some algorithms.

zhguanw-amd commented 7 months ago

@qianyich For the public release with new RDMA IP, it might be around June or July this year, as we need to upgrade Vivado version to support the new IP. And we also need to change current QDMA due to Vivado upgrade.

Regarding QPN, PSN and rkey, what we provide in the tests is just a showcase to demonstrate how to use RecoNIC and its RDMA via user-space APIs. Designers or users can change it according to their requirements. For example, security concerns you have. I'll leave this to developers if using libreconic.

More standard usage would be to go through RDMA-core library, which abstracts those variables from users. We have a version at the moment. But for public release, it would take longer time.

It seems most of the issues have been addressed. I'm going to close this thread. Feel free to open another threads if you have more questions.

zhguanw-amd commented 7 months ago

Please use the latest commit, as it contains enhancement or other fixes as well.