Xilinx / RecoNIC

RecoNIC is a software/hardware shell used to enable network-attached processing within an RDMA-featured SmartNIC for scale-out computing.
MIT License
105 stars 27 forks source link

Error: failed to lock page in memory #19

Closed qianyich closed 7 months ago

qianyich commented 8 months ago

The system is up and running. I can ping server from client side and ping client from server side.

qianyich@pc164:~/RecoNIC/drivers/onic-driver$ ping 192.100.52.1
PING 192.100.52.1 (192.100.52.1) 56(84) bytes of data.
64 bytes from 192.100.52.1: icmp_seq=1 ttl=64 time=0.238 ms
64 bytes from 192.100.52.1: icmp_seq=2 ttl=64 time=0.092 ms
64 bytes from 192.100.52.1: icmp_seq=3 ttl=64 time=0.164 ms
64 bytes from 192.100.52.1: icmp_seq=4 ttl=64 time=0.166 ms
64 bytes from 192.100.52.1: icmp_seq=5 ttl=64 time=0.140 ms
64 bytes from 192.100.52.1: icmp_seq=6 ttl=64 time=0.104 ms
64 bytes from 192.100.52.1: icmp_seq=7 ttl=64 time=0.126 ms
64 bytes from 192.100.52.1: icmp_seq=8 ttl=64 time=0.115 ms
64 bytes from 192.100.52.1: icmp_seq=9 ttl=64 time=0.108 ms
^C
--- 192.100.52.1 ping statistics ---
9 packets transmitted, 9 received, 0% packet loss, time 8193ms
rtt min/avg/max/mdev = 0.092/0.139/0.238/0.043 ms
qianyich@pc166:~/RecoNIC/drivers/onic-driver$ ping 192.100.51.1
PING 192.100.51.1 (192.100.51.1) 56(84) bytes of data.
64 bytes from 192.100.51.1: icmp_seq=1 ttl=64 time=0.167 ms
64 bytes from 192.100.51.1: icmp_seq=2 ttl=64 time=0.152 ms
64 bytes from 192.100.51.1: icmp_seq=3 ttl=64 time=0.154 ms
64 bytes from 192.100.51.1: icmp_seq=4 ttl=64 time=0.161 ms
64 bytes from 192.100.51.1: icmp_seq=5 ttl=64 time=0.154 ms
64 bytes from 192.100.51.1: icmp_seq=6 ttl=64 time=0.155 ms
^C
--- 192.100.51.1 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5110ms
rtt min/avg/max/mdev = 0.152/0.157/0.167/0.008 ms

When I was trying to run rdma_test read and write, I have the following error.

qianyich@pc164:~/RecoNIC/examples/rdma_test$ sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./write -r 192.100.51.1 -i 192.100.52.1 -p /sys/bus/pci/devices/0000\:3b\:00.0/resource2 -z 128 -l host_mem -d /dev/reconi
c-mm -c -u 22222 -t 11111 --dst_qp 2 -g 2>&1 | tee client_debug.log
src_ip_str = 192.100.51.1
dst_ip_str = 192.100.52.1
Info: mac_addr_t = 00:0a:35:fd:c0:a8
Info: PCIe resource file: /sys/bus/pci/devices/0000:3b:00.0/resource2
Info: QP allocated at: host_mem
Info: Device - /dev/reconic-mm
Info: src_ip = 192.100.51.1
Info: Found network interface: enp59s0
Info: mac_addr_t = 00:0a:35:c0:8c:15
Info: Creating rn_dev
/users/qianyich/RecoNIC/lib/reconic.c:296:create_rn_dev(): Info: scr(=4)) file open successfully
create_rn_dev - testing2
Error: failed to lock page in memory

I found this in lib/reconic.c:323. Is this due to insufficient huge page? I guess I need to enable and configure the number of huge pages in Linux. How many huge pages do I need?

Currently:

HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

After configure the hugepage number to 1024. I have the following error:

qianyich@pc164:~/RecoNIC/examples/rdma_test$ sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./write -r 192.100.51.1 -i 192.100.52.1 -p /sys/bus/pci/devices/0000\:3b\:00.0/resource2 -z 128 -l dev_mem -d /dev/reconic-mm -c -u 22222 -t 11111 --dst_qp 2 -g 2>&1 | tee client_debug.log
src_ip_str = 192.100.51.1
dst_ip_str = 192.100.52.1
Info: mac_addr_t = 00:0a:35:fd:c0:a8
Info: PCIe resource file: /sys/bus/pci/devices/0000:3b:00.0/resource2
Info: QP allocated at: dev_mem
Info: Device - /dev/reconic-mm
Info: src_ip = 192.100.51.1
Info: Found network interface: enp59s0
Info: mac_addr_t = 00:0a:35:c0:8c:15
Info: Creating rn_dev
/users/qianyich/RecoNIC/lib/reconic.c:296:create_rn_dev(): Info: scr(=4)) file open successfully
create_rn_dev - testing2
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f54e00
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f54e00000
Info: pre-allocated hugepage buffer vir addr = 0x7f0b0aa00000, physical addr = 0x2f54e00000
Info: Configuring QDMA AXI bridge BDF
/users/qianyich/RecoNIC/lib/reconic.c:194:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16420, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:195:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16424, bdf_addr_high=0x20
/users/qianyich/RecoNIC/lib/reconic.c:196:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16430, bdf_win_config=0xc2000000
Info: CREATE RDMA DEVICE
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f54e00
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f54e00000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f0b0aa00000, physical addr = 2f54e00000, rn_dev->buffer_offset = 0x200000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f54c00
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f54c00000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f0b0ac00000, physical addr = 2f54c00000, rn_dev->buffer_offset = 0x1200000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f55c00
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f55c00000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f0b0bc00000, physical addr = 2f55c00000, rn_dev->buffer_offset = 0x1202000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f55c02
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f55c02000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f0b0bc02000, physical addr = 2f55c02000, rn_dev->buffer_offset = 0x1212000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f55c12
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f55c12000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f0b0bc12000, physical addr = 2f55c12000, rn_dev->buffer_offset = 0x1222000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: OPEN RDMA DEVICE
/users/qianyich/RecoNIC/lib/rdma_api.c:186:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBA=0x600a0, value=0x54c00000
/users/qianyich/RecoNIC/lib/rdma_api.c:188:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBAMSB=0x600a4, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:190:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFSZ=0x600a8, value=0x10001000
/users/qianyich/RecoNIC/lib/rdma_api.c:193:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBA=0x60088, value=0x55c00000
/users/qianyich/RecoNIC/lib/rdma_api.c:195:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBAMSB=0x6008c, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:197:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60090, value=0x2000
/users/qianyich/RecoNIC/lib/rdma_api.c:200:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBA=0x60060, value=0x55c02000
/users/qianyich/RecoNIC/lib/rdma_api.c:202:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBAMSB=0x60064, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:204:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60068, value=0x1000100
/users/qianyich/RecoNIC/lib/rdma_api.c:207:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBA=0x600b0, value=0x55c12000
/users/qianyich/RecoNIC/lib/rdma_api.c:209:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBAMSB=0x600b4, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:211:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZ=0x600b8, value=0x10000
/users/qianyich/RecoNIC/lib/rdma_api.c:213:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZMSB=0x600bc, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:217:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_INTEN=0x60180, value=0xff
/users/qianyich/RecoNIC/lib/rdma_api.c:221:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDLSB=0x60010, value=0x35c08c15
/users/qianyich/RecoNIC/lib/rdma_api.c:223:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDMSB=0x60014, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:227:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPV4XADD=0x60070, value=0xc0643301
/users/qianyich/RecoNIC/lib/rdma_api.c:230:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICCONF=0x60000, value=0x56ce1421
/users/qianyich/RecoNIC/lib/rdma_api.c:233:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICADCONF=0x60004, value=0xa0004
Info: RDMA global control status registers are configured.
Info: rdma_dev opened
Info: ALLOCATE PD
/users/qianyich/RecoNIC/lib/rdma_api.c:254:allocate_rdma_pd(): [Register] RN_RDMA_PDT_PDPDNUM=0x40000, pd_num=0, value=0x0
Info: OPEN DEVICE FILE
Info: ALLOCATE RDMA QP
Allocating qp->sq
/users/qianyich/RecoNIC/lib/rdma_api.c:437:allocate_rdma_qp(): sq_size = 81920, cq_size = 5120, rq_size 655360, buf_location = dev_mem
/users/qianyich/RecoNIC/lib/reconic.c:251:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000000000, rn_dev->dev_buffer_offset = 0x14000
/users/qianyich/RecoNIC/lib/reconic.c:253:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Allocating qp->cq
/users/qianyich/RecoNIC/lib/reconic.c:251:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000014000, rn_dev->dev_buffer_offset = 0x15400
/users/qianyich/RecoNIC/lib/reconic.c:253:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Allocating qp->rq
/users/qianyich/RecoNIC/lib/reconic.c:251:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000016000, rn_dev->dev_buffer_offset = 0xb6000
/users/qianyich/RecoNIC/lib/reconic.c:253:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Info: queue pair setting is done! Configuring RDMA per-queu CSR registers
/users/qianyich/RecoNIC/lib/rdma_api.c:487:allocate_rdma_qp(): DEBUG: rdma_dev->rn_dev->axil_ctl = 0x7f0b2aa4c000, rdma_dev->axil_ctl = 0x7f0b2aa4c000
/users/qianyich/RecoNIC/lib/rdma_api.c:502:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_IPDESADDR1i=0x60360, qpid=2, value=0xc0643401
/users/qianyich/RecoNIC/lib/rdma_api.c:509:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDLSBi=0x60350, qpid=2, value=0x35fdc0a8
/users/qianyich/RecoNIC/lib/rdma_api.c:516:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDMSBi=0x60354, qpid=2, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:521:allocate_rdma_qp(): DEBUG: win_size_high = 0x1f, win_size_low = 0xffffffff
/users/qianyich/RecoNIC/lib/rdma_api.c:539:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAi=0x60310, qpid=2, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:546:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAMSBi=0x603c8, qpid=2, value=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:550:allocate_rdma_qp(): DEBUG: qp->sq->dma_addr = 0xa350000000000000, sq_addr_msb = 0xa3500000, sq_addr_lsb = 0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:568:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAi=0x60318, qpid=2, value=0x14000
/users/qianyich/RecoNIC/lib/rdma_api.c:575:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAMSBi=0x603d0, qpid=2, value=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:579:allocate_rdma_qp(): DEBUG: qp->cq->dma_addr = 0xa350000000014000, cq_addr_msb = 0xa3500000, cq_addr_lsb = 0x14000
/users/qianyich/RecoNIC/lib/rdma_api.c:597:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAi=0x60308, qpid=2, value=0x16000
/users/qianyich/RecoNIC/lib/rdma_api.c:604:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAMSBi=0x603c0, qpid=2, value=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:608:allocate_rdma_qp(): DEBUG: qp->rq->dma_addr = 0xa350000000016000, rq_addr_msb = 0xa3500000, rq_addr_lsb = 0x16000
/users/qianyich/RecoNIC/lib/rdma_api.c:617:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDi=0x60328, qpid=2, value=0x54e00000
/users/qianyich/RecoNIC/lib/rdma_api.c:624:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDMSBi=0x6032c, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:625:allocate_rdma_qp(): DEBUG: cq_cidb_addr = 0x2f54e00000
/users/qianyich/RecoNIC/lib/rdma_api.c:634:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDi=0x60320, qpid=2, value=0x54e00020
/users/qianyich/RecoNIC/lib/rdma_api.c:641:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDMSBi=0x60324, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:642:allocate_rdma_qp(): DEBUG: rq_cidb_addr = 0x2f54e00020
/users/qianyich/RecoNIC/lib/rdma_api.c:651:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_DESTQPCONFi=0x60348, qpid=2, value=0x2
/users/qianyich/RecoNIC/lib/rdma_api.c:660:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QDEPTHi=0x6033c, qpid=2, value=0x400040
/users/qianyich/RecoNIC/lib/rdma_api.c:707:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPCONFi=0x60300, qpid=2, value=0x200043d
/users/qianyich/RecoNIC/lib/rdma_api.c:721:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPADVCONFi=0x60304, qpid=2, value=0x12344000
/users/qianyich/RecoNIC/lib/rdma_api.c:730:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_PDi=0x603b0, qpid=2, value=0x0
Info: allocate_rdma_qp - Successfully allocated a rdma qp
Info: CONFIGURE PSN
[Register] RN_RDMA_QCSR_LSTRQREQi=0x60344, qpid=2, value=0xa000abc
[Register] RN_RDMA_QCSR_SQPSNi=0x60340, qpid=2, value=0xabd
payload_size = 128, payload_size>>2 = 32
Info: Client is connecting to a remote server
Info: Client is connected to a remote server
Error: Can't receive remote offset of A from the remote peer

This time looks like the error is from the read application at line 319, rc = read(sockfd, &read_A_offset, sizeof(read_A_offset)); returns a value that is not over 0. And I am kind of confused with why socket is involved here? My understanding is that RDMA has nothing to do with socket.

zhguanw-amd commented 8 months ago

Hi Yicheng,

Could you double check whether your TCP is working fine?

To set up RDMA, we need to first have the QP connection, which requires two peers to exchange some sort of information. The information includes such as r_key, remote addresses and psn. To exchange the information, it can be done via connection management with QP0 or TCP. We are using TCP for exchanging such information.

qianyich commented 8 months ago

Hi Guanwen,

I used tcping between my two servers, and it just works fine.

8 probes transmitted on port 22 | 8 received, 0.00% packet loss
successful probes:   8
unsuccessful probes: 0
last successful probe:   2024-03-21 22:26:29
last unsuccessful probe: Never failed
total uptime:   8 seconds
total downtime: 0.0 seconds
longest consecutive uptime:   7 seconds from 2024-03-21 22:26:22 to 2024-03-21 22:26:30
retried to resolve hostname 0 times
rtt min/avg/max: 0.223/0.266/0.324 ms
--------------------------------------
TCPing started at: 2024-03-21 22:26:22
TCPing ended at:   2024-03-21 22:26:30
duration (HH:MM:SS): 00:00:08

So the socket is only used for communication setup, verb api will be used for actual data transfer. That makes sense.

qianyich commented 8 months ago

Oh I guess you actually mean check TCP connection between 192.100.52.1 and 192.100.51.1. So I should use tcping 192.100.52.1 in this case. Let me do this and will get back to you.

Update: So I tcping the RecoNIC on server side from client side, it is working fine. I also tcping the RecoNIC on cliense side from server side, it is working fine as well. I think TCP is ok. I just used port 22 as ssh service is running on the server side.

qianyich@pc164:~/RecoNIC/drivers/onic-driver$ tcping 192.100.52.1 22
TCPinging 192.100.52.1 on port 22
Reply from 192.100.52.1 (192.100.52.1) on port 22 TCP_conn=1 time=0.408 ms
Reply from 192.100.52.1 (192.100.52.1) on port 22 TCP_conn=2 time=0.228 ms
Reply from 192.100.52.1 (192.100.52.1) on port 22 TCP_conn=3 time=0.318 ms
Reply from 192.100.52.1 (192.100.52.1) on port 22 TCP_conn=4 time=0.302 ms
Reply from 192.100.52.1 (192.100.52.1) on port 22 TCP_conn=5 time=0.330 ms
Reply from 192.100.52.1 (192.100.52.1) on port 22 TCP_conn=6 time=0.320 ms
Reply from 192.100.52.1 (192.100.52.1) on port 22 TCP_conn=7 time=0.216 ms
Reply from 192.100.52.1 (192.100.52.1) on port 22 TCP_conn=8 time=0.303 ms
Reply from 192.100.52.1 (192.100.52.1) on port 22 TCP_conn=9 time=0.226 ms
Reply from 192.100.52.1 (192.100.52.1) on port 22 TCP_conn=10 time=0.315 ms
^C
--- 192.100.52.1 TCPing statistics ---
10 probes transmitted on port 22 | 10 received, 0.00% packet loss
successful probes:   10
unsuccessful probes: 0
last successful probe:   2024-03-22 15:37:18
last unsuccessful probe: Never failed
total uptime:   10 seconds
total downtime: 0.0 seconds
longest consecutive uptime:   9 seconds from 2024-03-22 15:37:09 to 2024-03-22 15:37:18
rtt min/avg/max: 0.216/0.297/0.408 ms
--------------------------------------
TCPing started at: 2024-03-22 15:37:09
TCPing ended at:   2024-03-22 15:37:18
duration (HH:MM:SS): 00:00:10

I also tried both allocate QP in the host memory and in the device memory, and I have the same error in both cases Error: Can't receive remote offset of A from the remote peer, and rc is -1.

qianyich commented 8 months ago

I saw others have the same issue. https://github.com/Xilinx/RecoNIC/issues/12#issuecomment-1986413626 Do you have any suggestions on solving this TCP handshaking issue?

zhguanw-amd commented 8 months ago

Hi Yicheng,

Could you comment out all the code after exchanging the remote offset and see what's the issue? Actually, at this point, your program hasn't reached to the actual RDMA part, only a few RDMA register confgurations.

qianyich commented 8 months ago

From which line to which line? Do you mean comment out all actual rdma part(wqe,etc) or this problematic part? Please bear with me, I am still trying to understand what those codes are trying to do.

When I did tcping, should I use 11111 as the port number? I saw tcp port is 11111 in the command arguments, but I did not verify if that works or not.

adity14 commented 8 months ago

Hello,

Can you reprogram the boards and try with a larger payload size like 4096 or above by changing argument -z from 128 to 4096 in your command

Aditya.

qianyich commented 8 months ago

Hi Aditya and Guanwen,

Update: I got some new errors. The TCP issue has been resolved. The issue now is that it fails to send RDMA read request.

Specifically, ERROR: poll_cq_cidb timeout! sq_cidb = 0; Polling CQ CIDB = 0 and

/dev/reconic-mm, read off 0x0 + 0x1000 failed -1.
read file: Invalid argument
Info: The value of rc is -5
Error: read_to_buffer failed with rc = -5
Warning: QP in fatal status

***** QP2 FATAL RECOVERY *****
TIMEOUT: CQHEADi:0x0 and SQPIi:0x1 are different

Looks like this is the same problem in another issue https://github.com/Xilinx/RecoNIC/issues/12. May I know if there is a fix or not?

sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./read -r 192.100.51.1 -i 192.100.52.1 -p /sys/bus/pci/devices/0000\:3b\:00.0/resource2 -z 4096 -l host_mem -d /dev/reconic-mm -c -u 22222 -t 11111 --dst_qp 2 -g 2>&1 | tee client_debug.log
src_ip_str = 192.100.51.1
dst_ip_str = 192.100.52.1
Info: mac_addr_t = 00:0a:35:6a:cc:c7
Info: PCIe resource file: /sys/bus/pci/devices/0000:3b:00.0/resource2
Info: QP allocated at: host_mem
Info: Device - /dev/reconic-mm
Info: src_ip = 192.100.51.1
Info: Found network interface: enp59s0
Info: mac_addr_t = 00:0a:35:bb:ce:68
Info: Creating rn_dev
/users/qianyich/RecoNIC/lib/reconic.c:296:create_rn_dev(): Info: scr(=4)) file open successfully
create_rn_dev - testing2
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f52600
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f52600000
Info: pre-allocated hugepage buffer vir addr = 0x7f35aa000000, physical addr = 0x2f52600000
Info: Configuring QDMA AXI bridge BDF
/users/qianyich/RecoNIC/lib/reconic.c:194:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16420, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:195:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16424, bdf_addr_high=0x20
/users/qianyich/RecoNIC/lib/reconic.c:196:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16430, bdf_win_config=0xc2000000
Info: CREATE RDMA DEVICE
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f52600
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f52600000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f35aa000000, physical addr = 2f52600000, rn_dev->buffer_offset = 0x200000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f52400
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f52400000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f35aa200000, physical addr = 2f52400000, rn_dev->buffer_offset = 0x1200000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f53400
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f53400000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f35ab200000, physical addr = 2f53400000, rn_dev->buffer_offset = 0x1202000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f53402
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f53402000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f35ab202000, physical addr = 2f53402000, rn_dev->buffer_offset = 0x1212000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f53412
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f53412000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f35ab212000, physical addr = 2f53412000, rn_dev->buffer_offset = 0x1222000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: OPEN RDMA DEVICE
/users/qianyich/RecoNIC/lib/rdma_api.c:186:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBA=0x600a0, value=0x52400000
/users/qianyich/RecoNIC/lib/rdma_api.c:188:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBAMSB=0x600a4, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:190:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFSZ=0x600a8, value=0x10001000
/users/qianyich/RecoNIC/lib/rdma_api.c:193:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBA=0x60088, value=0x53400000
/users/qianyich/RecoNIC/lib/rdma_api.c:195:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBAMSB=0x6008c, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:197:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60090, value=0x2000
/users/qianyich/RecoNIC/lib/rdma_api.c:200:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBA=0x60060, value=0x53402000
/users/qianyich/RecoNIC/lib/rdma_api.c:202:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBAMSB=0x60064, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:204:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60068, value=0x1000100
/users/qianyich/RecoNIC/lib/rdma_api.c:207:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBA=0x600b0, value=0x53412000
/users/qianyich/RecoNIC/lib/rdma_api.c:209:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBAMSB=0x600b4, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:211:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZ=0x600b8, value=0x10000
/users/qianyich/RecoNIC/lib/rdma_api.c:213:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZMSB=0x600bc, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:217:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_INTEN=0x60180, value=0xff
/users/qianyich/RecoNIC/lib/rdma_api.c:221:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDLSB=0x60010, value=0x35bbce68
/users/qianyich/RecoNIC/lib/rdma_api.c:223:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDMSB=0x60014, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:227:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPV4XADD=0x60070, value=0xc0643301
/users/qianyich/RecoNIC/lib/rdma_api.c:230:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICCONF=0x60000, value=0x56ce1421
/users/qianyich/RecoNIC/lib/rdma_api.c:233:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICADCONF=0x60004, value=0xa0004
Info: RDMA global control status registers are configured.
Info: rdma_dev opened
Info: ALLOCATE PD
/users/qianyich/RecoNIC/lib/rdma_api.c:254:allocate_rdma_pd(): [Register] RN_RDMA_PDT_PDPDNUM=0x40000, pd_num=0, value=0x0
Info: OPEN DEVICE FILE
Info: ALLOCATE RDMA QP
Allocating qp->sq
/users/qianyich/RecoNIC/lib/rdma_api.c:437:allocate_rdma_qp(): sq_size = 81920, cq_size = 5120, rq_size 655360, buf_location = host_mem
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f53422
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f53422000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f35ab222000, physical addr = 2f53422000, rn_dev->buffer_offset = 0x1236000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Allocating qp->cq
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f53436
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f53436000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f35ab236000, physical addr = 2f53436000, rn_dev->buffer_offset = 0x1237400
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Allocating qp->rq
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f53438
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f53438000
/users/qianyich/RecoNIC/lib/reconic.c:227:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f35ab238000, physical addr = 2f53438000, rn_dev->buffer_offset = 0x12d8000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: queue pair setting is done! Configuring RDMA per-queu CSR registers
/users/qianyich/RecoNIC/lib/rdma_api.c:487:allocate_rdma_qp(): DEBUG: rdma_dev->rn_dev->axil_ctl = 0x7f35ca162000, rdma_dev->axil_ctl = 0x7f35ca162000
/users/qianyich/RecoNIC/lib/rdma_api.c:502:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_IPDESADDR1i=0x60360, qpid=2, value=0xc0643401
/users/qianyich/RecoNIC/lib/rdma_api.c:509:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDLSBi=0x60350, qpid=2, value=0x356accc7
/users/qianyich/RecoNIC/lib/rdma_api.c:516:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDMSBi=0x60354, qpid=2, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:521:allocate_rdma_qp(): DEBUG: win_size_high = 0x1f, win_size_low = 0xffffffff
/users/qianyich/RecoNIC/lib/rdma_api.c:539:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAi=0x60310, qpid=2, value=0x53422000
/users/qianyich/RecoNIC/lib/rdma_api.c:546:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAMSBi=0x603c8, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:550:allocate_rdma_qp(): DEBUG: qp->sq->dma_addr = 0x2f53422000, sq_addr_msb = 0xf, sq_addr_lsb = 0x53422000
/users/qianyich/RecoNIC/lib/rdma_api.c:568:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAi=0x60318, qpid=2, value=0x53436000
/users/qianyich/RecoNIC/lib/rdma_api.c:575:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAMSBi=0x603d0, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:579:allocate_rdma_qp(): DEBUG: qp->cq->dma_addr = 0x2f53436000, cq_addr_msb = 0xf, cq_addr_lsb = 0x53436000
/users/qianyich/RecoNIC/lib/rdma_api.c:597:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAi=0x60308, qpid=2, value=0x53438000
/users/qianyich/RecoNIC/lib/rdma_api.c:604:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAMSBi=0x603c0, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:608:allocate_rdma_qp(): DEBUG: qp->rq->dma_addr = 0x2f53438000, rq_addr_msb = 0xf, rq_addr_lsb = 0x53438000
/users/qianyich/RecoNIC/lib/rdma_api.c:617:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDi=0x60328, qpid=2, value=0x52600000
/users/qianyich/RecoNIC/lib/rdma_api.c:624:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDMSBi=0x6032c, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:625:allocate_rdma_qp(): DEBUG: cq_cidb_addr = 0x2f52600000
/users/qianyich/RecoNIC/lib/rdma_api.c:634:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDi=0x60320, qpid=2, value=0x52600020
/users/qianyich/RecoNIC/lib/rdma_api.c:641:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDMSBi=0x60324, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:642:allocate_rdma_qp(): DEBUG: rq_cidb_addr = 0x2f52600020
/users/qianyich/RecoNIC/lib/rdma_api.c:651:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_DESTQPCONFi=0x60348, qpid=2, value=0x2
/users/qianyich/RecoNIC/lib/rdma_api.c:660:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QDEPTHi=0x6033c, qpid=2, value=0x400040
/users/qianyich/RecoNIC/lib/rdma_api.c:707:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPCONFi=0x60300, qpid=2, value=0x200043d
/users/qianyich/RecoNIC/lib/rdma_api.c:721:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPADVCONFi=0x60304, qpid=2, value=0x12344000
/users/qianyich/RecoNIC/lib/rdma_api.c:730:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_PDi=0x603b0, qpid=2, value=0x0
Info: allocate_rdma_qp - Successfully allocated a rdma qp
Info: CONFIGURE PSN
[Register] RN_RDMA_QCSR_LSTRQREQi=0x60344, qpid=2, value=0xa000abc
[Register] RN_RDMA_QCSR_SQPSNi=0x60340, qpid=2, value=0xabd
payload_size = 4096, payload_size>>2 = 1024
Info: Client is connecting to a remote server
Info: Client is connected to a remote server
Info: client received remote offset of A = 0xa350000000000000
/users/qianyich/RecoNIC/lib/reconic.c:251:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000000000, rn_dev->dev_buffer_offset = 0x1000
/users/qianyich/RecoNIC/lib/reconic.c:253:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Info: creating an RDMA read WQE for getting data
/users/qianyich/RecoNIC/lib/rdma_api.c:769:create_a_wqe(): Info: WQE mem_buffer = 0xa350000000000000, masked_mem_buffer = 0xa350000000000000
/users/qianyich/RecoNIC/lib/rdma_api.c:796:create_a_wqe(): [WQE] wrid=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:797:create_a_wqe(): [WQE] laddr_low=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:798:create_a_wqe(): [WQE] laddr_high=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:799:create_a_wqe(): [WQE] length=0x1000
/users/qianyich/RecoNIC/lib/rdma_api.c:800:create_a_wqe(): [WQE] opcode=0x4
/users/qianyich/RecoNIC/lib/rdma_api.c:801:create_a_wqe(): [WQE] remote_offset_low=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:802:create_a_wqe(): [WQE] remote_offset_high=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:803:create_a_wqe(): [WQE] r_key=0x8
/users/qianyich/RecoNIC/lib/rdma_api.c:804:create_a_wqe(): [WQE] send_small_payload0=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:805:create_a_wqe(): [WQE] send_small_payload1=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:806:create_a_wqe(): [WQE] send_small_payload2=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:807:create_a_wqe(): [WQE] send_small_payload3=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:808:create_a_wqe(): [WQE] immdt_data=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:875:rdma_post_send(): DEBUG: Reading hardware SQPIi (0x60338) = 0x1
/users/qianyich/RecoNIC/lib/rdma_api.c:876:rdma_post_send(): DEBUG: original qp->sq_pidb = 0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:882:rdma_post_send(): [Register] RN_RDMA_QCSR_SQPIi=0x60338, qpid=2, value=0x1
/users/qianyich/RecoNIC/lib/rdma_api.c:883:rdma_post_send(): DEBUG: Update hardware sq db idx from software = 1
/users/qianyich/RecoNIC/lib/rdma_api.c:884:rdma_post_send(): DEBUG: Reading hardware SQPIi (0x60338) = 0x1
/users/qianyich/RecoNIC/lib/rdma_api.c:844:poll_cq_cidb(): [Register] RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:846:poll_cq_cidb(): DEBUG: before polling: sq_cidb = 0; Polling CQ CIDB = 0
ERROR: poll_cq_cidb timeout! sq_cidb = 0; Polling CQ CIDB = 0
Info: Dump register values for debug purpose
Info: [RN_RDMA_GCSR_ERRBUFWPTR      = 0x6006c] = 0x0
Info: [RN_RDMA_GCSR_IPKTERRQWPTR    = 0x60094] = 0x0
Info: [RN_RDMA_GCSR_INSRRPKTCNT     = 0x60100] = 0x0
Info: [RN_RDMA_GCSR_INAMPKTCNT      = 0x60104] = 0x0
Info: [RN_RDMA_GCSR_OUTIOPKTCNT     = 0x60108] = 0x10000
Info: [RN_RDMA_GCSR_OUTAMPKTCNT     = 0x6010c] = 0x0
Info: [RN_RDMA_GCSR_LSTINPKT        = 0x60110] = 0x2
Info: [RN_RDMA_GCSR_LSTOUTPKT       = 0x60114] = 0x157a204
Info: [RN_RDMA_GCSR_ININVDUPCNT     = 0x60118] = 0x0
Info: [RN_RDMA_GCSR_INNCKPKTSTS     = 0x6011c] = 0x0
Info: [RN_RDMA_GCSR_OUTRNRPKTSTS    = 0x60120] = 0x0
Info: [RN_RDMA_GCSR_WQEPROCSTS      = 0x60124] = 0x1012040
Info: [RN_RDMA_GCSR_QPMSTS          = 0x6012c] = 0x10002
Info: [RN_RDMA_GCSR_INALLDRPPKTCNT  = 0x60130] = 0x0
Info: [RN_RDMA_GCSR_INNAKPKTCNT     = 0x60134] = 0x0
Info: [RN_RDMA_GCSR_OUTNAKPKTCNT    = 0x60138] = 0x0
Info: [RN_RDMA_GCSR_RESPHNDSTS      = 0x6013c] = 0x30802
Info: [RN_RDMA_GCSR_RETRYCNTSTS     = 0x60140] = 0x0
Info: [RN_RDMA_GCSR_INCNPPKTCNT     = 0x60174] = 0x0
Info: [RN_RDMA_GCSR_OUTCNPPKTCNT    = 0x60178] = 0x0
Info: [RN_RDMA_GCSR_OUTRDRSPPKTCNT  = 0x6017c] = 0x0
Info: [RN_RDMA_GCSR_INTSTS          = 0x60184] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS1       = 0x60190] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS2       = 0x60194] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS3       = 0x60198] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS4       = 0x6019c] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS5       = 0x601a0] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS6       = 0x601a4] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS7       = 0x601a8] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS8       = 0x601ac] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS1       = 0x601b0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS2       = 0x601b4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS3       = 0x601b8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS4       = 0x601bc] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS5       = 0x601c0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS6       = 0x601c4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS7       = 0x601c8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS8       = 0x601cc] = 0x0
Info: [RN_RDMA_QCSR_CQHEADi         = 0x60330] = 0x0
Info: [RN_RDMA_QCSR_STATSSNi        = 0x60380] = 0x1
Info: [RN_RDMA_QCSR_STATMSNi        = 0x60384] = 0x0
Info: [RN_RDMA_QCSR_STATQPi         = 0x60388] = 0x200
Info: [RN_RDMA_QCSR_STATCURSQPTRi   = 0x6038c] = 0x1
Info: [RN_RDMA_QCSR_STATRESPSNi     = 0x60390] = 0xabc
Info: [RN_RDMA_QCSR_STATRQBUFCAi    = 0x60394] = 0x53438000
Info: [RN_RDMA_QCSR_STATWQEi        = 0x60398] = 0x0
Info: [RN_RDMA_QCSR_STATRQPIDBi     = 0x6039c] = 0x0
Info: [RN_RDMA_QCSR_STATRQBUFCAMSBi = 0x603d8] = 0xf
Info: [RN_RDMA_QCSR_SQPIi           = 0x60338] = 0x1

Failed to send an RDMA read operation
Info: Dump register values for debug purpose
Info: [RN_RDMA_GCSR_ERRBUFWPTR      = 0x6006c] = 0x0
Info: [RN_RDMA_GCSR_IPKTERRQWPTR    = 0x60094] = 0x0
Info: [RN_RDMA_GCSR_INSRRPKTCNT     = 0x60100] = 0x0
Info: [RN_RDMA_GCSR_INAMPKTCNT      = 0x60104] = 0x0
Info: [RN_RDMA_GCSR_OUTIOPKTCNT     = 0x60108] = 0x10000
Info: [RN_RDMA_GCSR_OUTAMPKTCNT     = 0x6010c] = 0x0
Info: [RN_RDMA_GCSR_LSTINPKT        = 0x60110] = 0x2
Info: [RN_RDMA_GCSR_LSTOUTPKT       = 0x60114] = 0x157a204
Info: [RN_RDMA_GCSR_ININVDUPCNT     = 0x60118] = 0x0
Info: [RN_RDMA_GCSR_INNCKPKTSTS     = 0x6011c] = 0x0
Info: [RN_RDMA_GCSR_OUTRNRPKTSTS    = 0x60120] = 0x0
Info: [RN_RDMA_GCSR_WQEPROCSTS      = 0x60124] = 0x1012040
Info: [RN_RDMA_GCSR_QPMSTS          = 0x6012c] = 0x10002
Info: [RN_RDMA_GCSR_INALLDRPPKTCNT  = 0x60130] = 0x0
Info: [RN_RDMA_GCSR_INNAKPKTCNT     = 0x60134] = 0x0
Info: [RN_RDMA_GCSR_OUTNAKPKTCNT    = 0x60138] = 0x0
Info: [RN_RDMA_GCSR_RESPHNDSTS      = 0x6013c] = 0x30802
Info: [RN_RDMA_GCSR_RETRYCNTSTS     = 0x60140] = 0x0
Info: [RN_RDMA_GCSR_INCNPPKTCNT     = 0x60174] = 0x0
Info: [RN_RDMA_GCSR_OUTCNPPKTCNT    = 0x60178] = 0x0
Info: [RN_RDMA_GCSR_OUTRDRSPPKTCNT  = 0x6017c] = 0x0
Info: [RN_RDMA_GCSR_INTSTS          = 0x60184] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS1       = 0x60190] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS2       = 0x60194] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS3       = 0x60198] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS4       = 0x6019c] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS5       = 0x601a0] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS6       = 0x601a4] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS7       = 0x601a8] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS8       = 0x601ac] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS1       = 0x601b0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS2       = 0x601b4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS3       = 0x601b8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS4       = 0x601bc] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS5       = 0x601c0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS6       = 0x601c4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS7       = 0x601c8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS8       = 0x601cc] = 0x0
Info: [RN_RDMA_QCSR_CQHEADi         = 0x60330] = 0x0
Info: [RN_RDMA_QCSR_STATSSNi        = 0x60380] = 0x1
Info: [RN_RDMA_QCSR_STATMSNi        = 0x60384] = 0x0
Info: [RN_RDMA_QCSR_STATQPi         = 0x60388] = 0x200
Info: [RN_RDMA_QCSR_STATCURSQPTRi   = 0x6038c] = 0x1
Info: [RN_RDMA_QCSR_STATRESPSNi     = 0x60390] = 0xabc
Info: [RN_RDMA_QCSR_STATRQBUFCAi    = 0x60394] = 0x53438000
Info: [RN_RDMA_QCSR_STATWQEi        = 0x60398] = 0x0
Info: [RN_RDMA_QCSR_STATRQPIDBi     = 0x6039c] = 0x0
Info: [RN_RDMA_QCSR_STATRQBUFCAMSBi = 0x603d8] = 0xf
Info: [RN_RDMA_QCSR_SQPIi           = 0x60338] = 0x1

Info: All data has been received!
Info: buffer physical address is 0xa350000000000000
Info: Time spent 855.780000 usec, size = 4096 bytes, Bandwidth = 0.038290 gigabits/sec
/dev/reconic-mm, read off 0x0 + 0x1000 failed -1.
read file: Invalid argument
Info: The value of rc is -5
Error: read_to_buffer failed with rc = -5
Warning: QP in fatal status

***** QP2 FATAL RECOVERY *****
TIMEOUT: CQHEADi:0x0 and SQPIi:0x1 are different
qianyich commented 8 months ago

I think another issue is with the dma engine. I ran the dma test but it failed on reading from the device memory. Do you have any suggestions on solving this?

qianyich@pc167:~/RecoNIC/examples/dma_test$ sudo ./dma_test -d /dev/reconic-mm -s 65536000 -c 200
Write scenario
size=65536000 Average BW = 10.629003 GB/sec, average latency = 6165.771279 us
qianyich@pc167:~/RecoNIC/examples/dma_test$ sudo ./dma_test -d /dev/reconic-mm -s 65536000 -c 200 -r
Read scenario
/dev/reconic-mm, read off 0x0 + 0x3e80000 failed -1.
read file: Invalid argument

From dmesg, I got the following. I am not that familiar with this driver, but looks like the read request is a Q_H2C type and it shouldn't be.

[ 5026.394723] onic:qdma_request_submit: qdma3b000-ST-2: bad direction, R (req->write && (descq->conf.q_type != Q_H2C) = 0 (!req->write && (descq->conf.q_type != Q_C2H)) = 1 descq->conf.q_type = H2C.
[ 5026.412609] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.

Then I see that the read_to_buffer always fails in read, write, systolic array test (rc == -5 returned from read_to_buffer). For the systolic array, I have the following logs: client_debug.log server_debug.log

client: sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./network_systolic_mm -d /dev/reconic-mm -p /sys/bus/pci/devices/0000\:3b\:00.0/resource2 -r 192.100.52.1 -i 192.100.51.1 -u 22222 -t 11111 --dst_qp 2 -c 2>&1 | tee client_debug.log

server: sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./network_systolic_mm -d /dev/reconic-mm -p /sys/bus/pci/devices/0000\:3b\:00.0/resource2 -r 192.100.51.1 -i 192.100.52.1 -u 22222 -t 11111 --dst_qp 2 -s 2>&1 | tee server_debug.log

zhguanw-amd commented 8 months ago

@qianyich Hi Yicheng,

Check your log below:

/users/qianyich/RecoNIC/lib/rdma_api.c:550:allocate_rdma_qp(): DEBUG: qp->sq->dma_addr = 0x2f53422000, sq_addr_msb = 0xf, sq_addr_lsb = 0x53422000

In RecoNIC's QDMA setup, we use 1TB as the AXI BAR0 (which uses PCIe slave bridge to read/write host data) and it gets further splited into 8 windows equally. This means that the BDF mask is for 128GB. I guess you're using 256GB system, right? Your dma_address is not matched with {sq_addr_msb, sq_addr_lsb}, which means that the hardware can't read any WQE from the SQ. That's why you can make it work.

To enable 256GB mask, you need to configure the BDF table's 0x2430 register with 0xC400_0000. You can check the information here.

Regarding the DMA engine issue, probably your hardware configuration or connection has some problems when porting to U280. We never encounter this issue on U250. What you could do is to check the built-in example along with the QDMA 4.0 on U280, see any special configuration. To build the example, please check this.

qianyich commented 8 months ago

Hi Guanwen,

I took a look at this https://support.xilinx.com/s/article/Demystifying-BDF-Table-programming-for-Slave-Bridge-Address-Translations-for-AXI-address?language=en_US. And it does help me understand this better. I understand the address above goes over the range of 128 GB, and the S_AXI_BRIDGE is set to be 1TB, but I am not sure if I am using 256 GB, how can I check this and how can I write 0x2430 with C4000000 (I do understand it is from 256 1024 1024 / 4)?

Will check the configuration of the example QDMA project later and keep you posted.

zhguanw-amd commented 8 months ago

@qianyich I mean your server probably has 256GB host memory, that's why you will have addresses above 128GB range when you allocate memory.

To write 0x2430, please check this line. "AXIB_BDF_MAP_CONTROL_ADDR" is 0x2430 register.

qianyich commented 8 months ago

@zhguanw-amd Hi Guanwen,

I generated the example QDMA design, and tested it with dma_ip_drivers. My script tests the functionality of QDMA.

qdma3b000       0000:3b:00.0    max QP: 32, 0~31
=============Hardware Version============

RTL Version         : RTL Base
Vivado ReleaseID    : vivado 2020.2
QDMA Device Type    : Soft IP
QDMA IP Type    : EQDMA4.0 Soft IP
============Software Version============

qdma driver version : 2022.1.5.5.

=============Hardware Capabilities============

Number of PFs supported                : 1
Total number of queues supported       : 512
MM channels                            : 1
FLR Present                            : no
ST enabled                             : yes
MM enabled                             : yes
Mailbox enabled                        : no
MM completion enabled                  : no
Debug Mode enabled                     : no
Desc Engine Mode                       : Inernal only mode
qdma3b000:statistics
Total MM H2C packets processed = 17000
Total MM C2H packets processed = 17
Total ST H2C packets processed = 0
Total ST C2H packets processed = 0
Min Ping Pong Latency = 0
Max Ping Pong Latency = 0
Avg Ping Pong Latency = 0
-e
List Xilinx PCIe QDMA queues.
Zero Qs-e
Test Xilinx PCIe QDMA h2c channel.
qdma3b000-MM-0 H2C added.
Added 1 Queues.
/dev/qdma3b000-MM-0
dma-ctl: Info: Default ring size set to 2048
1 Queues started, idx 0 ~ 0.
size=4096 Average BW = 368.020416 MB/sec
size=65536 Average BW = 3.080631 GB/sec
Stopped Queues 0 -> 0.
Deleted Queues 0 -> 0.
qdma3b000:statistics
Total MM H2C packets processed = 34000
Total MM C2H packets processed = 17
Total ST H2C packets processed = 0
Total ST C2H packets processed = 0
Min Ping Pong Latency = 0
Max Ping Pong Latency = 0
Avg Ping Pong Latency = 0
-e
Test Xilinx PCIe QDMA c2h channel.
qdma3b000-MM-0 C2H added.
Added 1 Queues.
/dev/qdma3b000-MM-0
dma-ctl: Info: Default ring size set to 2048
1 Queues started, idx 0 ~ 0.
size=4096 Average BW = 336.469760 MB/sec
size=65536 Average BW = 3.472523 GB/sec
Stopped Queues 0 -> 0.
Deleted Queues 0 -> 0.
qdma3b000:statistics
Total MM H2C packets processed = 34000
Total MM C2H packets processed = 17017
Total ST H2C packets processed = 0
Total ST C2H packets processed = 0
Min Ping Pong Latency = 0
Max Ping Pong Latency = 0
Avg Ping Pong Latency = 0
-e
Test done.

Here are some differences I observed between this example QDMA project and the one in open-nic-shell:

  1. Descriptor bypass is disabled in the built-in QDMA example, but is on in open-nic-shell
  2. The BAR for AXI lite master in open-nic-shell is 4MB and has a value of FFFFFFFFFFC00004. It is FFFFFFFFFFFFF004 in the example with a size of 4KB. This shouldn't cause any issues. image
  3. QDMA is using advanced mode in open-nic-shell which is reasonable because we need AXI-lite CSR slave interface.

I think these differences won't affect the functionality of QDMA in this case.

qianyich commented 8 months ago

Is it possible because the evaluate ERNIC IP license cracks the integrity of the whole bitstream after certain hours? And now the DMA is not working?

zhguanw-amd commented 8 months ago

@qianyich

Is it possible because the evaluate ERNIC IP license cracks the integrity of the whole bitstream after certain hours? And now the DMA is not working?

The two won't affect each other.

I download the source from the repo and try it today. I also have the same issue for the DMA part. It seems we push a version, which causes the DMA read-side problem. I'll revert the version and fix it next week (this week is too busy). Thanks and sorry for the trouble! The other issue mentioned above should be related to your system setup.

qianyich commented 8 months ago

Sounds good! Thank you so much Guanwen!

qianyich commented 8 months ago

Any progress on this?

zhguanw-amd commented 7 months ago

@qianyich You can try the latest commit, there is a fix in the driver side.

qianyich commented 7 months ago

I am now having a problem loading the driver when using the latest commit. The driver got compiled, but it failed to initialize the qdma for the first time. Then I reboot the machine, the error messages are not consistent and are about cmac IP. They are shown below.

first time error:

[ 1173.819980] pci 0000:3b:00.0: [10ee:903f] type 00 class 0x058000
[ 1173.819998] pci 0000:3b:00.0: reg 0x10: [mem 0xab400000-0xab43ffff 64bit]
[ 1173.820005] pci 0000:3b:00.0: reg 0x18: [mem 0xab000000-0xab3fffff 64bit]
[ 1173.820023] pci 0000:3b:00.0: enabling Extended Tags
[ 1173.820230] pci 0000:3b:00.0: BAR 2: assigned [mem 0xab000000-0xab3fffff 64bit]
[ 1173.820236] pci 0000:3b:00.0: BAR 0: assigned [mem 0xab400000-0xab43ffff 64bit]
[ 1173.820360] onic 0000:3b:00.0 onic59s0f0 (uninitialized): Set MAC address to 0:a:35:a:3f:f7
[ 1173.820362] onic 0000:3b:00.0: device is a master PF
[ 1173.820364] onic_set_num_queue: num_msix 8, nb_queues 7, pci_msix_user_cnt 1
[ 1173.820365] onic_pci_probe mm_queues: 4
[ 1173.820417] onic:qdma_device_open: onic, 3b:00.00, pdev 0x00000000c9cf9db1, 0x10ee:0x903f.
[ 1173.820557] qdma_is_config_bar: Invalid config bar, err:-4
[ 1173.826056] qdma_hw_access_init: config bar passed is INVALID, err:-1
[ 1173.832546] onic 0000:3b:00.0: onic_qdma_setup: qdma_device_open() failed: Error Code: -22
[ 1173.840808] onic 0000:3b:00.0: onic_pci_probe: onic_qdma_setup() failed with status -22
[ 1173.848869] onic: probe of 0000:3b:00.0 failed with error -22

After reboot:

[  153.604932] onic 0000:3b:00.0 onic59s0f0 (uninitialized): Set MAC address to 0:a:35:ac:da:4a
[  153.604933] onic 0000:3b:00.0: device is a master PF
[  153.604936] onic_set_num_queue: num_msix 8, nb_queues 7, pci_msix_user_cnt 1
[  153.604936] onic_pci_probe mm_queues: 4
[  153.604998] onic:qdma_device_open: onic, 3b:00.00, pdev 0x0000000015f7489e, 0x10ee:0x903f.
[  153.605143] Device Type: Soft IP
[  153.605143] IP Type: EQDMA Soft IP
[  153.605144] Vivado Release: vivado 2020.2
[  153.605149] onic:qdma_device_attributes_get: qdma3b000-p0000:3b:00.0: num_pfs:1, num_qs:512, flr_present:0, st_en:1, mm_en:1, mm_cmpt_en:0, mailbox_en:0, mm_channel_max:1, qid2vec_ctx:0, cmpt_ovf_chk_dis:1, mailbox_intr:1, sw_desc_64b:1, cmpt_desc_64b:1, dynamic_bar:1, legacy_intr:1, cmpt_trig_count_timer:1
[  153.605151] onic:qdma_device_open: Vivado version = vivado 2020.2
[  153.605153] qdma_dev_entry_create: Created the dev entry successfully
[  153.608566] onic:xdev_identify_bars: AXI Master Lite BAR 2.
[  153.608567] onic:qdma_device_open: 0000:3b:00.0, 3b000, pdev 0x0000000015f7489e, xdev 0x00000000ebc4e2ab, ch 1, q 68, vf 0.
[  153.708116] [ERROR] onic_enable_cmac, rx_not_aligned
[  153.713083] onic 0000:3b:00.0: onic_pci_probe: onic_enable_cmac() failed with status -16
[  153.721338] onic: probe of 0000:3b:00.0 failed with error -16

I ran git checkout ebf6a5293934272ef98ee09515e91f72e57d0678 to revert to the previous commit, and the initialization was fine.

[  282.727274] onic 0000:3b:00.0 onic59s0f0 (uninitialized): Set MAC address to 0:a:35:1:db:e6
[  282.727275] onic 0000:3b:00.0: device is a master PF
[  282.727277] onic_set_num_queue: num_msix 8, nb_queues 7, pci_msix_user_cnt 1
[  282.727278] onic_pci_probe mm_queues: 4
[  282.727336] onic:qdma_device_open: onic, 3b:00.00, pdev 0x000000004abb27cf, 0x10ee:0x903f.
[  282.727477] Device Type: Soft IP
[  282.727477] IP Type: EQDMA Soft IP
[  282.727478] Vivado Release: vivado 2020.2
[  282.727483] onic:qdma_device_attributes_get: qdma3b000-p0000:3b:00.0: num_pfs:1, num_qs:512, flr_present:0, st_en:1, mm_en:1, mm_cmpt_en:0, mailbox_en:0, mm_channel_max:1, qid2vec_ctx:0, cmpt_ovf_chk_dis:1, mailbox_intr:1, sw_desc_64b:1, cmpt_desc_64b:1, dynamic_bar:1, legacy_intr:1, cmpt_trig_count_timer:1
[  282.727484] onic:qdma_device_open: Vivado version = vivado 2020.2
[  282.727485] qdma_dev_entry_create: Created the dev entry successfully
[  282.730924] onic:xdev_identify_bars: AXI Master Lite BAR 2.
[  282.730926] onic:qdma_device_open: 0000:3b:00.0, 3b000, pdev 0x000000004abb27cf, xdev 0x000000004f91e5d1, ch 1, q 64, vf 0.
[  282.832977] onic 0000:3b:00.0 enp59s0: renamed from onic59s0f0
zhguanw-amd commented 7 months ago

Ah okay. Please re-enable the part (Line 1158 - 1166 in onic_enable_cmac() located at onic_main.c) I commented out. It should work with your setup. I guess you're having two boards connected directly.

qianyich commented 7 months ago

hmm, the driver can be loaded, but the dma is still problematic. Both read and write failed. BTW, my two boards are not connected directly. They are connected to network switch.

qianyich@pc164:~/RecoNIC/examples/dma_test$ sudo ./dma_test -d /dev/reconic-mm -s 65536000 -c 200
Write scenario
/dev/reconic-mm, W off 0x0, 0x3e80000 failed -1.
write file: Input/output error
qianyich@pc164:~/RecoNIC/examples/dma_test$ sudo ./dma_test -d /dev/reconic-mm -s 65536000 -c 200 -r
Read scenario
/dev/reconic-mm, read off 0x0 + 0x3e80000 failed -1.
read file: Input/output error

I saw there are some registers being dumped when running those tests, are they the internal registers of RDMA core? We would like to know some information on the RDMA core's microarchitecture resources (such as WQE cache, etc), though it is not possible to access the source of RDMA core. So is it possible to know any cache states internal to the RDMA core?

zhguanw-amd commented 7 months ago

Hi Yicheng,

Can you put some pr_info in the onic_main.c and get (q_no, qhndl, xpriv->base_rx_q_handle, queue_id, q_handle, xpriv->pinfo->active_rx_queues, xpriv->pinfo->active_tx_queues) in onic_rx_pkt_process(), onic_rx_poll(), onic_isr_rx_tophalf(), onic_qdma_rx_queue_add(), onic_isr_tx_tophalf() and onic_qdma_tx_queue_setup()? And please share the dmesg log with me, so that I can understand what's going on at your end.

For the prinfo, do remember to put "__func_\", so that I can know which function.

Thanks.

zhguanw-amd commented 7 months ago

Regarding your rdma question, you can check ERNIC 3.1 document's Chapter 2 Register Space, there are some status registers available, which provide internal information of the RDMA engine. We are planning to upgrade it to version 4.0.

Did you solve the address issue I mentioned before by configuring more BDF table and removing address masking? If you haven't done that, the register information has no meaning, as the RDMA engine is not working properly. If you don't know how to do it, please wait for one or two weeks and I'll update it to include larger host memory up to 1TB.

qianyich commented 7 months ago

The BDF table entry is now correct and the memory address space configuration is all good. However, I think it would be better to make it support a larger memory by default.

I will add those pr_info by tomorrow. Just noticed that some variables are not valid in the scope of some functions, and I will just ignore those. I am adding pr_info at the end of those functions.

The error messages for dma from dmesg are shown below

[ 1645.058747] onic:qdma_request_wait_for_cmpl: qdma3b000-MM-66: req 0x000000004e125e5d, W,4190208,0/65536000,0x0, done 0, err 0, tm 10000.
[ 1645.071002] onic:qdma_descq_dump: qdma3b000-MM-66: 0x42/0x42, desc sz 1024/0, pidx 1023, cidx 0
[ 1645.071340] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[ 1657.090981] onic:qdma_request_wait_for_cmpl: qdma3b000-MM-66: req 0x0000000087622d09, R,4190208,0/65536000,0x0, done 0, err 0, tm 10000.
[ 1657.103228] onic:qdma_descq_dump: qdma3b000-MM-66: 0x42/0x42, desc sz 1024/0, pidx 1023, cidx 0
[ 1657.103934] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
zhguanw-amd commented 7 months ago

Probably you can send me your modifications to support u280 (henry.zhong AT amd.com). I can take a look and check it next week.

qianyich commented 7 months ago

I am not sure what is going on, but after I inserted pr_info lines, rebooted the machine, and reloaded the driver. The dma starts to work

qianyich@pc164:~/RecoNIC/examples/dma_test$ sudo ./dma_test -d /dev/reconic-mm -s 65536000 -c 200
Write scenario
size=65536000 Average BW = 4.948909 GB/sec, average latency = 13242.515735 us
qianyich@pc164:~/RecoNIC/examples/dma_test$ sudo  ./dma_test -d /dev/reconic-mm -s 65536000 -c 200 -r
Read scenario
size=65536000 Average BW = 5.257542 GB/sec, average latency = 12465.140782 us
zhguanw-amd commented 7 months ago

Okay, probably your machine/FPGA was in wrong status when you loaded the driver. Can you start a fresh one by re-programing the FPGA board and loading the driver? See whether it works.

Regarding the BDF table, I realize that QDMA sets the 8 windows equally, which means that each window will have up to 128GB mapping. The previous "0xC400_0000" mentioned is wrong. I'll update the correct one next week, as my bandwidth this week is full now.

qianyich commented 7 months ago

I used another FPGA on another machine, and the dma is surely working as it passed dma_test.

However, the rdma is still not working. When doing rdma read test, the received data did not match the golden result.

Client Side:

sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./read -r 192.100.51.1 -i 192.100.52.1 -p /sys/bus/pci/devices/0000\:3b\:00.0/resource2 -z 128 -l host_mem -d /dev/reconic
-mm -c -u 22222 -t 11111 --dst_qp 2 -g 2>&1 | tee client_debug.log
src_ip_str = 192.100.51.1
dst_ip_str = 192.100.52.1
Info: mac_addr_t = 00:0a:35:5b:e2:96
Info: PCIe resource file: /sys/bus/pci/devices/0000:3b:00.0/resource2
Info: QP allocated at: host_mem
Info: Device - /dev/reconic-mm
Info: src_ip = 192.100.51.1
Info: Found network interface: enp59s0
Info: mac_addr_t = 00:0a:35:58:dd:fe
Info: Creating rn_dev
/users/qianyich/RecoNIC/lib/reconic.c:297:create_rn_dev(): Info: scr(=4)) file open successfully
create_rn_dev - testing2
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16cd600
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16cd600000
Info: pre-allocated hugepage buffer vir addr = 0x7f9cab200000, physical addr = 0x16cd600000
Info: Configuring QDMA AXI bridge BDF
/users/qianyich/RecoNIC/lib/reconic.c:195:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16420, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:196:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16424, bdf_addr_high=0x0
/users/qianyich/RecoNIC/lib/reconic.c:197:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16430, bdf_win_config=0xc4000000
Info: CREATE RDMA DEVICE
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16cd600
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16cd600000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f9cab200000, physical addr = 16cd600000, rn_dev->buffer_offset = 0x200000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16cd400
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16cd400000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f9cab400000, physical addr = 16cd400000, rn_dev->buffer_offset = 0x1200000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16ce400
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16ce400000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f9cac400000, physical addr = 16ce400000, rn_dev->buffer_offset = 0x1202000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16ce402
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16ce402000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f9cac402000, physical addr = 16ce402000, rn_dev->buffer_offset = 0x1212000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16ce412
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16ce412000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f9cac412000, physical addr = 16ce412000, rn_dev->buffer_offset = 0x1222000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: OPEN RDMA DEVICE
/users/qianyich/RecoNIC/lib/rdma_api.c:186:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBA=0x600a0, value=0xcd400000
/users/qianyich/RecoNIC/lib/rdma_api.c:188:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBAMSB=0x600a4, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:190:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFSZ=0x600a8, value=0x10001000
/users/qianyich/RecoNIC/lib/rdma_api.c:193:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBA=0x60088, value=0xce400000
/users/qianyich/RecoNIC/lib/rdma_api.c:195:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBAMSB=0x6008c, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:197:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60090, value=0x2000
/users/qianyich/RecoNIC/lib/rdma_api.c:200:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBA=0x60060, value=0xce402000
/users/qianyich/RecoNIC/lib/rdma_api.c:202:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBAMSB=0x60064, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:204:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60068, value=0x1000100
/users/qianyich/RecoNIC/lib/rdma_api.c:207:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBA=0x600b0, value=0xce412000
/users/qianyich/RecoNIC/lib/rdma_api.c:209:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBAMSB=0x600b4, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:211:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZ=0x600b8, value=0x10000
/users/qianyich/RecoNIC/lib/rdma_api.c:213:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZMSB=0x600bc, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:217:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_INTEN=0x60180, value=0xff
/users/qianyich/RecoNIC/lib/rdma_api.c:221:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDLSB=0x60010, value=0x3558ddfe
/users/qianyich/RecoNIC/lib/rdma_api.c:223:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDMSB=0x60014, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:227:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPV4XADD=0x60070, value=0xc0643301
/users/qianyich/RecoNIC/lib/rdma_api.c:230:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICCONF=0x60000, value=0x56ce1421
/users/qianyich/RecoNIC/lib/rdma_api.c:233:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICADCONF=0x60004, value=0xa0004
Info: RDMA global control status registers are configured.
Info: rdma_dev opened
Info: ALLOCATE PD
/users/qianyich/RecoNIC/lib/rdma_api.c:254:allocate_rdma_pd(): [Register] RN_RDMA_PDT_PDPDNUM=0x40000, pd_num=0, value=0x0
Info: OPEN DEVICE FILE
Info: ALLOCATE RDMA QP
Allocating qp->sq
/users/qianyich/RecoNIC/lib/rdma_api.c:437:allocate_rdma_qp(): sq_size = 81920, cq_size = 5120, rq_size 655360, buf_location = host_mem
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16ce422
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16ce422000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f9cac422000, physical addr = 16ce422000, rn_dev->buffer_offset = 0x1236000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Allocating qp->cq
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16ce436
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16ce436000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f9cac436000, physical addr = 16ce436000, rn_dev->buffer_offset = 0x1237400
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Allocating qp->rq
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16ce438
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16ce438000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f9cac438000, physical addr = 16ce438000, rn_dev->buffer_offset = 0x12d8000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: queue pair setting is done! Configuring RDMA per-queu CSR registers
/users/qianyich/RecoNIC/lib/rdma_api.c:487:allocate_rdma_qp(): DEBUG: rdma_dev->rn_dev->axil_ctl = 0x7f9ccb3c8000, rdma_dev->axil_ctl = 0x7f9ccb3c8000
/users/qianyich/RecoNIC/lib/rdma_api.c:502:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_IPDESADDR1i=0x60360, qpid=2, value=0xc0643401
/users/qianyich/RecoNIC/lib/rdma_api.c:509:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDLSBi=0x60350, qpid=2, value=0x355be296
/users/qianyich/RecoNIC/lib/rdma_api.c:516:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDMSBi=0x60354, qpid=2, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:521:allocate_rdma_qp(): DEBUG: win_size_high = 0x1f, win_size_low = 0xffffffff
/users/qianyich/RecoNIC/lib/rdma_api.c:539:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAi=0x60310, qpid=2, value=0xce422000
/users/qianyich/RecoNIC/lib/rdma_api.c:546:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAMSBi=0x603c8, qpid=2, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:550:allocate_rdma_qp(): DEBUG: qp->sq->dma_addr = 0x16ce422000, sq_addr_msb = 0x16, sq_addr_lsb = 0xce422000
/users/qianyich/RecoNIC/lib/rdma_api.c:568:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAi=0x60318, qpid=2, value=0xce436000
/users/qianyich/RecoNIC/lib/rdma_api.c:575:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAMSBi=0x603d0, qpid=2, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:579:allocate_rdma_qp(): DEBUG: qp->cq->dma_addr = 0x16ce436000, cq_addr_msb = 0x16, cq_addr_lsb = 0xce436000
/users/qianyich/RecoNIC/lib/rdma_api.c:597:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAi=0x60308, qpid=2, value=0xce438000
/users/qianyich/RecoNIC/lib/rdma_api.c:604:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAMSBi=0x603c0, qpid=2, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:608:allocate_rdma_qp(): DEBUG: qp->rq->dma_addr = 0x16ce438000, rq_addr_msb = 0x16, rq_addr_lsb = 0xce438000
/users/qianyich/RecoNIC/lib/rdma_api.c:617:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDi=0x60328, qpid=2, value=0xcd600000
/users/qianyich/RecoNIC/lib/rdma_api.c:624:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDMSBi=0x6032c, qpid=2, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:625:allocate_rdma_qp(): DEBUG: cq_cidb_addr = 0x16cd600000
/users/qianyich/RecoNIC/lib/rdma_api.c:634:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDi=0x60320, qpid=2, value=0xcd600020
/users/qianyich/RecoNIC/lib/rdma_api.c:641:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDMSBi=0x60324, qpid=2, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:642:allocate_rdma_qp(): DEBUG: rq_cidb_addr = 0x16cd600020
/users/qianyich/RecoNIC/lib/rdma_api.c:651:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_DESTQPCONFi=0x60348, qpid=2, value=0x2
/users/qianyich/RecoNIC/lib/rdma_api.c:660:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QDEPTHi=0x6033c, qpid=2, value=0x400040
/users/qianyich/RecoNIC/lib/rdma_api.c:707:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPCONFi=0x60300, qpid=2, value=0x200043d
/users/qianyich/RecoNIC/lib/rdma_api.c:721:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPADVCONFi=0x60304, qpid=2, value=0x12344000
/users/qianyich/RecoNIC/lib/rdma_api.c:730:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_PDi=0x603b0, qpid=2, value=0x0
Info: allocate_rdma_qp - Successfully allocated a rdma qp
Info: CONFIGURE PSN
[Register] RN_RDMA_QCSR_LSTRQREQi=0x60344, qpid=2, value=0xa000abc
[Register] RN_RDMA_QCSR_SQPSNi=0x60340, qpid=2, value=0xabd
payload_size = 128, payload_size>>2 = 32
Info: Client is connecting to a remote server
Info: Client is connected to a remote server
Info: client received remote offset of A = 0xa350000000000000
/users/qianyich/RecoNIC/lib/reconic.c:252:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000000000, rn_dev->dev_buffer_offset = 0x80
/users/qianyich/RecoNIC/lib/reconic.c:254:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Info: creating an RDMA read WQE for getting data
/users/qianyich/RecoNIC/lib/rdma_api.c:769:create_a_wqe(): Info: WQE mem_buffer = 0xa350000000000000, masked_mem_buffer = 0xa350000000000000
/users/qianyich/RecoNIC/lib/rdma_api.c:796:create_a_wqe(): [WQE] wrid=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:797:create_a_wqe(): [WQE] laddr_low=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:798:create_a_wqe(): [WQE] laddr_high=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:799:create_a_wqe(): [WQE] length=0x80
/users/qianyich/RecoNIC/lib/rdma_api.c:800:create_a_wqe(): [WQE] opcode=0x4
/users/qianyich/RecoNIC/lib/rdma_api.c:801:create_a_wqe(): [WQE] remote_offset_low=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:802:create_a_wqe(): [WQE] remote_offset_high=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:803:create_a_wqe(): [WQE] r_key=0x8
/users/qianyich/RecoNIC/lib/rdma_api.c:804:create_a_wqe(): [WQE] send_small_payload0=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:805:create_a_wqe(): [WQE] send_small_payload1=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:806:create_a_wqe(): [WQE] send_small_payload2=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:807:create_a_wqe(): [WQE] send_small_payload3=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:808:create_a_wqe(): [WQE] immdt_data=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:875:rdma_post_send(): DEBUG: Reading hardware SQPIi (0x60338) = 0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:876:rdma_post_send(): DEBUG: original qp->sq_pidb = 0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:882:rdma_post_send(): [Register] RN_RDMA_QCSR_SQPIi=0x60338, qpid=2, value=0x1
/users/qianyich/RecoNIC/lib/rdma_api.c:883:rdma_post_send(): DEBUG: Update hardware sq db idx from software = 1
/users/qianyich/RecoNIC/lib/rdma_api.c:884:rdma_post_send(): DEBUG: Reading hardware SQPIi (0x60338) = 0x1
/users/qianyich/RecoNIC/lib/rdma_api.c:844:poll_cq_cidb(): [Register] RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x1
/users/qianyich/RecoNIC/lib/rdma_api.c:846:poll_cq_cidb(): DEBUG: before polling: sq_cidb = 0; Polling CQ CIDB = 1
/users/qianyich/RecoNIC/lib/rdma_api.c:857:poll_cq_cidb(): DEBUG: after polling: sq_cidb = 0; Polling CQ CIDB = 1
Successfully sent an RDMA read operation
Info: Dump register values for debug purpose
Info: [RN_RDMA_GCSR_ERRBUFWPTR      = 0x6006c] = 0x0
Info: [RN_RDMA_GCSR_IPKTERRQWPTR    = 0x60094] = 0x0
Info: [RN_RDMA_GCSR_INSRRPKTCNT     = 0x60100] = 0x0
Info: [RN_RDMA_GCSR_INAMPKTCNT      = 0x60104] = 0x1
Info: [RN_RDMA_GCSR_OUTIOPKTCNT     = 0x60108] = 0x10000
Info: [RN_RDMA_GCSR_OUTAMPKTCNT     = 0x6010c] = 0x0
Info: [RN_RDMA_GCSR_LSTINPKT        = 0x60110] = 0xabd0211
Info: [RN_RDMA_GCSR_LSTOUTPKT       = 0x60114] = 0x157a200
Info: [RN_RDMA_GCSR_ININVDUPCNT     = 0x60118] = 0x0
Info: [RN_RDMA_GCSR_INNCKPKTSTS     = 0x6011c] = 0x0
Info: [RN_RDMA_GCSR_OUTRNRPKTSTS    = 0x60120] = 0x0
Info: [RN_RDMA_GCSR_WQEPROCSTS      = 0x60124] = 0x1012040
Info: [RN_RDMA_GCSR_QPMSTS          = 0x6012c] = 0x10002
Info: [RN_RDMA_GCSR_INALLDRPPKTCNT  = 0x60130] = 0x10000
Info: [RN_RDMA_GCSR_INNAKPKTCNT     = 0x60134] = 0x0
Info: [RN_RDMA_GCSR_OUTNAKPKTCNT    = 0x60138] = 0x0
Info: [RN_RDMA_GCSR_RESPHNDSTS      = 0x6013c] = 0x10002
Info: [RN_RDMA_GCSR_RETRYCNTSTS     = 0x60140] = 0x0
Info: [RN_RDMA_GCSR_INCNPPKTCNT     = 0x60174] = 0x0
Info: [RN_RDMA_GCSR_OUTCNPPKTCNT    = 0x60178] = 0x0
Info: [RN_RDMA_GCSR_OUTRDRSPPKTCNT  = 0x6017c] = 0x0
Info: [RN_RDMA_GCSR_INTSTS          = 0x60184] = 0x10
Info: [RN_RDMA_GCSR_RQINTSTS1       = 0x60190] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS2       = 0x60194] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS3       = 0x60198] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS4       = 0x6019c] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS5       = 0x601a0] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS6       = 0x601a4] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS7       = 0x601a8] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS8       = 0x601ac] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS1       = 0x601b0] = 0x4
Info: [RN_RDMA_GCSR_CQINTSTS2       = 0x601b4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS3       = 0x601b8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS4       = 0x601bc] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS5       = 0x601c0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS6       = 0x601c4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS7       = 0x601c8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS8       = 0x601cc] = 0x0
Info: [RN_RDMA_QCSR_CQHEADi         = 0x60330] = 0x1
Info: [RN_RDMA_QCSR_STATSSNi        = 0x60380] = 0x3
Info: [RN_RDMA_QCSR_STATMSNi        = 0x60384] = 0x0
Info: [RN_RDMA_QCSR_STATQPi         = 0x60388] = 0x600
Info: [RN_RDMA_QCSR_STATCURSQPTRi   = 0x6038c] = 0x1
Info: [RN_RDMA_QCSR_STATRESPSNi     = 0x60390] = 0xabc
Info: [RN_RDMA_QCSR_STATRQBUFCAi    = 0x60394] = 0xce438000
Info: [RN_RDMA_QCSR_STATWQEi        = 0x60398] = 0x0
Info: [RN_RDMA_QCSR_STATRQPIDBi     = 0x6039c] = 0x0
Info: [RN_RDMA_QCSR_STATRQBUFCAMSBi = 0x603d8] = 0x16
Info: [RN_RDMA_QCSR_SQPIi           = 0x60338] = 0x1

Info: All data has been received!
Info: buffer physical address is 0xa350000000000000
Info: Time spent 7.643000 usec, size = 128 bytes, Bandwidth = 0.133979 gigabits/sec
Info: The value of rc is 128
Info: CHECK RECEIVED DATA
Error: received data mismatched: recv[1]=0, sw_golden[1]=1
/users/qianyich/RecoNIC/lib/rdma_api.c:1083:destroy_rdma_qp(): [DEBUG] Destroying dev: 0x7f9ccb3c8000, RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:1088:destroy_rdma_qp(): [DEBUG] Destroying dev: 0x7f9ccb3c8000, RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:1096:destroy_rdma_qp(): [DEBUG] Destroying dev: 0x7f9ccb3c8000, RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0

Server side:

sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./read -r 192.100.52.1 -i 192.100.51.1 -p /sys/bus/pci/devices/0000\:3b\:00.0/resource2 -z 128 -l host_mem -d /dev/reconic-mm -s -u 22222 -t 11111 --dst_qp 2 -g 2>&1 | tee server_debug.log
src_ip_str = 192.100.52.1
dst_ip_str = 192.100.51.1
Info: mac_addr_t = 00:0a:35:58:dd:fe
Info: PCIe resource file: /sys/bus/pci/devices/0000:3b:00.0/resource2
Info: QP allocated at: host_mem
Info: Device - /dev/reconic-mm
Info: src_ip = 192.100.52.1
Info: Found network interface: enp59s0
Info: mac_addr_t = 00:0a:35:5b:e2:96
Info: Creating rn_dev
/users/qianyich/RecoNIC/lib/reconic.c:297:create_rn_dev(): Info: scr(=4)) file open successfully
create_rn_dev - testing2
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f0f800
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f0f800000
Info: pre-allocated hugepage buffer vir addr = 0x7fb0ae000000, physical addr = 0x2f0f800000
Info: Configuring QDMA AXI bridge BDF
/users/qianyich/RecoNIC/lib/reconic.c:195:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16420, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:196:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16424, bdf_addr_high=0x20
/users/qianyich/RecoNIC/lib/reconic.c:197:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16430, bdf_win_config=0xc4000000
Info: CREATE RDMA DEVICE
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f0f800
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f0f800000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb0ae000000, physical addr = 2f0f800000, rn_dev->buffer_offset = 0x200000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f0fe00
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f0fe00000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb0ae200000, physical addr = 2f0fe00000, rn_dev->buffer_offset = 0x1200000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f10e00
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f10e00000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb0af200000, physical addr = 2f10e00000, rn_dev->buffer_offset = 0x1202000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f10e02
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f10e02000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb0af202000, physical addr = 2f10e02000, rn_dev->buffer_offset = 0x1212000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f10e12
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f10e12000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb0af212000, physical addr = 2f10e12000, rn_dev->buffer_offset = 0x1222000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: OPEN RDMA DEVICE
/users/qianyich/RecoNIC/lib/rdma_api.c:186:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBA=0x600a0, value=0xfe00000
/users/qianyich/RecoNIC/lib/rdma_api.c:188:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBAMSB=0x600a4, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:190:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFSZ=0x600a8, value=0x10001000
/users/qianyich/RecoNIC/lib/rdma_api.c:193:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBA=0x60088, value=0x10e00000
/users/qianyich/RecoNIC/lib/rdma_api.c:195:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBAMSB=0x6008c, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:197:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60090, value=0x2000
/users/qianyich/RecoNIC/lib/rdma_api.c:200:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBA=0x60060, value=0x10e02000
/users/qianyich/RecoNIC/lib/rdma_api.c:202:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBAMSB=0x60064, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:204:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60068, value=0x1000100
/users/qianyich/RecoNIC/lib/rdma_api.c:207:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBA=0x600b0, value=0x10e12000
/users/qianyich/RecoNIC/lib/rdma_api.c:209:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBAMSB=0x600b4, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:211:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZ=0x600b8, value=0x10000
/users/qianyich/RecoNIC/lib/rdma_api.c:213:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZMSB=0x600bc, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:217:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_INTEN=0x60180, value=0xff
/users/qianyich/RecoNIC/lib/rdma_api.c:221:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDLSB=0x60010, value=0x355be296
/users/qianyich/RecoNIC/lib/rdma_api.c:223:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDMSB=0x60014, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:227:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPV4XADD=0x60070, value=0xc0643401
/users/qianyich/RecoNIC/lib/rdma_api.c:230:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICCONF=0x60000, value=0x56ce1421
/users/qianyich/RecoNIC/lib/rdma_api.c:233:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICADCONF=0x60004, value=0xa0004
Info: RDMA global control status registers are configured.
Info: rdma_dev opened
Info: ALLOCATE PD
/users/qianyich/RecoNIC/lib/rdma_api.c:254:allocate_rdma_pd(): [Register] RN_RDMA_PDT_PDPDNUM=0x40000, pd_num=0, value=0x0
Info: OPEN DEVICE FILE
Info: ALLOCATE RDMA QP
Allocating qp->sq
/users/qianyich/RecoNIC/lib/rdma_api.c:437:allocate_rdma_qp(): sq_size = 81920, cq_size = 5120, rq_size 655360, buf_location = host_mem
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f10e22
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f10e22000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb0af222000, physical addr = 2f10e22000, rn_dev->buffer_offset = 0x1236000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Allocating qp->cq
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f10e36
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f10e36000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb0af236000, physical addr = 2f10e36000, rn_dev->buffer_offset = 0x1237400
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Allocating qp->rq
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f10e38
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f10e38000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb0af238000, physical addr = 2f10e38000, rn_dev->buffer_offset = 0x12d8000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: queue pair setting is done! Configuring RDMA per-queu CSR registers
/users/qianyich/RecoNIC/lib/rdma_api.c:487:allocate_rdma_qp(): DEBUG: rdma_dev->rn_dev->axil_ctl = 0x7fb0ce07f000, rdma_dev->axil_ctl = 0x7fb0ce07f000
/users/qianyich/RecoNIC/lib/rdma_api.c:502:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_IPDESADDR1i=0x60360, qpid=2, value=0xc0643301
/users/qianyich/RecoNIC/lib/rdma_api.c:509:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDLSBi=0x60350, qpid=2, value=0x3558ddfe
/users/qianyich/RecoNIC/lib/rdma_api.c:516:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDMSBi=0x60354, qpid=2, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:521:allocate_rdma_qp(): DEBUG: win_size_high = 0x1f, win_size_low = 0xffffffff
/users/qianyich/RecoNIC/lib/rdma_api.c:539:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAi=0x60310, qpid=2, value=0x10e22000
/users/qianyich/RecoNIC/lib/rdma_api.c:546:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAMSBi=0x603c8, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:550:allocate_rdma_qp(): DEBUG: qp->sq->dma_addr = 0x2f10e22000, sq_addr_msb = 0xf, sq_addr_lsb = 0x10e22000
/users/qianyich/RecoNIC/lib/rdma_api.c:568:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAi=0x60318, qpid=2, value=0x10e36000
/users/qianyich/RecoNIC/lib/rdma_api.c:575:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAMSBi=0x603d0, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:579:allocate_rdma_qp(): DEBUG: qp->cq->dma_addr = 0x2f10e36000, cq_addr_msb = 0xf, cq_addr_lsb = 0x10e36000
/users/qianyich/RecoNIC/lib/rdma_api.c:597:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAi=0x60308, qpid=2, value=0x10e38000
/users/qianyich/RecoNIC/lib/rdma_api.c:604:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAMSBi=0x603c0, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:608:allocate_rdma_qp(): DEBUG: qp->rq->dma_addr = 0x2f10e38000, rq_addr_msb = 0xf, rq_addr_lsb = 0x10e38000
/users/qianyich/RecoNIC/lib/rdma_api.c:617:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDi=0x60328, qpid=2, value=0xf800000
/users/qianyich/RecoNIC/lib/rdma_api.c:624:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDMSBi=0x6032c, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:625:allocate_rdma_qp(): DEBUG: cq_cidb_addr = 0x2f0f800000
/users/qianyich/RecoNIC/lib/rdma_api.c:634:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDi=0x60320, qpid=2, value=0xf800020
/users/qianyich/RecoNIC/lib/rdma_api.c:641:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDMSBi=0x60324, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:642:allocate_rdma_qp(): DEBUG: rq_cidb_addr = 0x2f0f800020
/users/qianyich/RecoNIC/lib/rdma_api.c:651:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_DESTQPCONFi=0x60348, qpid=2, value=0x2
/users/qianyich/RecoNIC/lib/rdma_api.c:660:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QDEPTHi=0x6033c, qpid=2, value=0x400040
/users/qianyich/RecoNIC/lib/rdma_api.c:707:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPCONFi=0x60300, qpid=2, value=0x200043d
/users/qianyich/RecoNIC/lib/rdma_api.c:721:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPADVCONFi=0x60304, qpid=2, value=0x12344000
/users/qianyich/RecoNIC/lib/rdma_api.c:730:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_PDi=0x603b0, qpid=2, value=0x0
Info: allocate_rdma_qp - Successfully allocated a rdma qp
Info: CONFIGURE PSN
[Register] RN_RDMA_QCSR_LSTRQREQi=0x60344, qpid=2, value=0xa000abc
[Register] RN_RDMA_QCSR_SQPSNi=0x60340, qpid=2, value=0xabd
payload_size = 128, payload_size>>2 = 32
Info: Server is listening to a remote peer
Info: Server is connected to a remote peer
/users/qianyich/RecoNIC/lib/reconic.c:252:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000000000, rn_dev->dev_buffer_offset = 0x80
/users/qianyich/RecoNIC/lib/reconic.c:254:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Info: rdma_register_memory_region - registering memory region
/users/qianyich/RecoNIC/lib/rdma_api.c:316:rdma_register_memory_region(): [Register] RN_RDMA_PDT_VIRTADDRLSB=0x40004, pd_num=0, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:318:rdma_register_memory_region(): [Register] RN_RDMA_PDT_VIRTADDRMSB=0x40008, pd_num=0, value=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:320:rdma_register_memory_region(): [Register] RN_RDMA_PDT_BUFBASEADDRLSB=0x4000c, pd_num=0, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:322:rdma_register_memory_region(): [Register] RN_RDMA_PDT_BUFBASEADDRMSB=0x40010, pd_num=0, value=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:324:rdma_register_memory_region(): [Register] RN_RDMA_PDT_BUFRKEY=0x40014, pd_num=0, value=0x8
/users/qianyich/RecoNIC/lib/rdma_api.c:327:rdma_register_memory_region(): [Register] RN_RDMA_PDT_WRRDBUFLEN=0x40018, pd_num=0, value=0x80 B
/users/qianyich/RecoNIC/lib/rdma_api.c:330:rdma_register_memory_region(): [Register] RN_RDMA_PDT_ACCESSDESC=0x4001c, pd_num=0, value=0x2
Info: memory region for the 0-th PD is registered
Info: allocating buffer for payload data
Info: tmp_buffer->buffer = 0xa350000000000000, tmp_buffer->dma_addr = 0xa350000000000000
Info: copy payload data to the device memory
Info: copied payload data to the device memory succesfully rc = 128
Sending read_offset (a350000000000000) to the remote client
Does the client finish its RDMA read operation? If yes, please press any key

Info: Dump register values for debug purpose
Info: [RN_RDMA_GCSR_ERRBUFWPTR      = 0x6006c] = 0x0
Info: [RN_RDMA_GCSR_IPKTERRQWPTR    = 0x60094] = 0x0
Info: [RN_RDMA_GCSR_INSRRPKTCNT     = 0x60100] = 0x0
Info: [RN_RDMA_GCSR_INAMPKTCNT      = 0x60104] = 0x0
Info: [RN_RDMA_GCSR_OUTIOPKTCNT     = 0x60108] = 0x0
Info: [RN_RDMA_GCSR_OUTAMPKTCNT     = 0x6010c] = 0x1
Info: [RN_RDMA_GCSR_LSTINPKT        = 0x60110] = 0xabd020a
Info: [RN_RDMA_GCSR_LSTOUTPKT       = 0x60114] = 0x0
Info: [RN_RDMA_GCSR_ININVDUPCNT     = 0x60118] = 0x0
Info: [RN_RDMA_GCSR_INNCKPKTSTS     = 0x6011c] = 0x0
Info: [RN_RDMA_GCSR_OUTRNRPKTSTS    = 0x60120] = 0x0
Info: [RN_RDMA_GCSR_WQEPROCSTS      = 0x60124] = 0x1012040
Info: [RN_RDMA_GCSR_QPMSTS          = 0x6012c] = 0x2
Info: [RN_RDMA_GCSR_INALLDRPPKTCNT  = 0x60130] = 0x10000
Info: [RN_RDMA_GCSR_INNAKPKTCNT     = 0x60134] = 0x0
Info: [RN_RDMA_GCSR_OUTNAKPKTCNT    = 0x60138] = 0x0
Info: [RN_RDMA_GCSR_RESPHNDSTS      = 0x6013c] = 0x10000
Info: [RN_RDMA_GCSR_RETRYCNTSTS     = 0x60140] = 0x0
Info: [RN_RDMA_GCSR_INCNPPKTCNT     = 0x60174] = 0x0
Info: [RN_RDMA_GCSR_OUTCNPPKTCNT    = 0x60178] = 0x0
Info: [RN_RDMA_GCSR_OUTRDRSPPKTCNT  = 0x6017c] = 0x0
Info: [RN_RDMA_GCSR_INTSTS          = 0x60184] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS1       = 0x60190] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS2       = 0x60194] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS3       = 0x60198] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS4       = 0x6019c] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS5       = 0x601a0] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS6       = 0x601a4] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS7       = 0x601a8] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS8       = 0x601ac] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS1       = 0x601b0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS2       = 0x601b4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS3       = 0x601b8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS4       = 0x601bc] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS5       = 0x601c0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS6       = 0x601c4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS7       = 0x601c8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS8       = 0x601cc] = 0x0
Info: [RN_RDMA_QCSR_CQHEADi         = 0x60330] = 0x0
Info: [RN_RDMA_QCSR_STATSSNi        = 0x60380] = 0x1
Info: [RN_RDMA_QCSR_STATMSNi        = 0x60384] = 0x1
Info: [RN_RDMA_QCSR_STATQPi         = 0x60388] = 0x600
Info: [RN_RDMA_QCSR_STATCURSQPTRi   = 0x6038c] = 0x1
Info: [RN_RDMA_QCSR_STATRESPSNi     = 0x60390] = 0xabc
Info: [RN_RDMA_QCSR_STATRQBUFCAi    = 0x60394] = 0x10e38000
Info: [RN_RDMA_QCSR_STATWQEi        = 0x60398] = 0x0
Info: [RN_RDMA_QCSR_STATRQPIDBi     = 0x6039c] = 0x0
Info: [RN_RDMA_QCSR_STATRQBUFCAMSBi = 0x603d8] = 0xf

Warning: CQHEADi and SQPIi for QP2 are mismatched

***** QP2 FATAL RECOVERY *****
TIMEOUT: CQHEADi:0x0 and SQPIi:0x1 are different
qianyich commented 7 months ago

The send_recv test failed as well. The receiver side just hang there. The sender side just failed to perform send operation.

Receiver Side:

sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./send_recv -r 192.100.51.1 -i 192.100.52.1 -p /sys/bus/pci/devices/0000\:3b\:00.0/resource2 -z 128 -l host_mem -d /dev/re
conic-mm -c -u 22222 --dst_qp 2 -g 2>&1 | tee client_debug.log
src_ip_str = 192.100.51.1
dst_ip_str = 192.100.52.1
Info: mac_addr_t = 00:0a:35:5b:e2:96
Info: PCIe resource file: /sys/bus/pci/devices/0000:3b:00.0/resource2
Info: QP allocated at: host_mem
Info: Device - /dev/reconic-mm
Info: src_ip = 192.100.51.1
Info: Found network interface: enp59s0
Info: mac_addr_t = 00:0a:35:58:dd:fe
Info: CREATE RecoNIC DEVICE
/users/qianyich/RecoNIC/lib/reconic.c:297:create_rn_dev(): Info: scr(=4)) file open successfully
create_rn_dev - testing2
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16ed000
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16ed000000
Info: pre-allocated hugepage buffer vir addr = 0x7fb732400000, physical addr = 0x16ed000000
Info: Configuring QDMA AXI bridge BDF
/users/qianyich/RecoNIC/lib/reconic.c:195:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16420, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:196:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16424, bdf_addr_high=0x0
/users/qianyich/RecoNIC/lib/reconic.c:197:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16430, bdf_win_config=0xc4000000
Info: CREATE RDMA DEVICE
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16ed000
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16ed000000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb732400000, physical addr = 16ed000000, rn_dev->buffer_offset = 0x200000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16ed200
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16ed200000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb732600000, physical addr = 16ed200000, rn_dev->buffer_offset = 0x1200000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16ec200
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16ec200000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb733600000, physical addr = 16ec200000, rn_dev->buffer_offset = 0x1202000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16ec202
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16ec202000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb733602000, physical addr = 16ec202000, rn_dev->buffer_offset = 0x1212000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16ec212
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16ec212000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb733612000, physical addr = 16ec212000, rn_dev->buffer_offset = 0x1222000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: OPEN RDMA DEVICE
/users/qianyich/RecoNIC/lib/rdma_api.c:186:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBA=0x600a0, value=0xed200000
/users/qianyich/RecoNIC/lib/rdma_api.c:188:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBAMSB=0x600a4, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:190:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFSZ=0x600a8, value=0x10001000
/users/qianyich/RecoNIC/lib/rdma_api.c:193:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBA=0x60088, value=0xec200000
/users/qianyich/RecoNIC/lib/rdma_api.c:195:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBAMSB=0x6008c, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:197:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60090, value=0x2000
/users/qianyich/RecoNIC/lib/rdma_api.c:200:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBA=0x60060, value=0xec202000
/users/qianyich/RecoNIC/lib/rdma_api.c:202:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBAMSB=0x60064, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:204:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60068, value=0x1000100
/users/qianyich/RecoNIC/lib/rdma_api.c:207:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBA=0x600b0, value=0xec212000
/users/qianyich/RecoNIC/lib/rdma_api.c:209:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBAMSB=0x600b4, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:211:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZ=0x600b8, value=0x10000
/users/qianyich/RecoNIC/lib/rdma_api.c:213:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZMSB=0x600bc, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:217:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_INTEN=0x60180, value=0xff
/users/qianyich/RecoNIC/lib/rdma_api.c:221:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDLSB=0x60010, value=0x3558ddfe
/users/qianyich/RecoNIC/lib/rdma_api.c:223:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDMSB=0x60014, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:227:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPV4XADD=0x60070, value=0xc0643301
/users/qianyich/RecoNIC/lib/rdma_api.c:230:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICCONF=0x60000, value=0x56ce1421
/users/qianyich/RecoNIC/lib/rdma_api.c:233:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICADCONF=0x60004, value=0xa0004
Info: RDMA global control status registers are configured.
Info: rdma_dev opened
Info: ALLOCATE PD
/users/qianyich/RecoNIC/lib/rdma_api.c:254:allocate_rdma_pd(): [Register] RN_RDMA_PDT_PDPDNUM=0x40000, pd_num=0, value=0x0
Info: OPEN DEVICE FILE
Info: ALLOCATE RDMA QP
Allocating qp->sq
/users/qianyich/RecoNIC/lib/rdma_api.c:437:allocate_rdma_qp(): sq_size = 81920, cq_size = 5120, rq_size 655360, buf_location = host_mem
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16ec222
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16ec222000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb733622000, physical addr = 16ec222000, rn_dev->buffer_offset = 0x1236000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Allocating qp->cq
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16ec236
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16ec236000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb733636000, physical addr = 16ec236000, rn_dev->buffer_offset = 0x1237400
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Allocating qp->rq
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x16ec238
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x16ec238000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7fb733638000, physical addr = 16ec238000, rn_dev->buffer_offset = 0x12d8000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: queue pair setting is done! Configuring RDMA per-queu CSR registers
/users/qianyich/RecoNIC/lib/rdma_api.c:487:allocate_rdma_qp(): DEBUG: rdma_dev->rn_dev->axil_ctl = 0x7fb752508000, rdma_dev->axil_ctl = 0x7fb752508000
/users/qianyich/RecoNIC/lib/rdma_api.c:502:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_IPDESADDR1i=0x60360, qpid=2, value=0xc0643401
/users/qianyich/RecoNIC/lib/rdma_api.c:509:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDLSBi=0x60350, qpid=2, value=0x355be296
/users/qianyich/RecoNIC/lib/rdma_api.c:516:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDMSBi=0x60354, qpid=2, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:521:allocate_rdma_qp(): DEBUG: win_size_high = 0x1f, win_size_low = 0xffffffff
/users/qianyich/RecoNIC/lib/rdma_api.c:539:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAi=0x60310, qpid=2, value=0xec222000
/users/qianyich/RecoNIC/lib/rdma_api.c:546:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAMSBi=0x603c8, qpid=2, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:550:allocate_rdma_qp(): DEBUG: qp->sq->dma_addr = 0x16ec222000, sq_addr_msb = 0x16, sq_addr_lsb = 0xec222000
/users/qianyich/RecoNIC/lib/rdma_api.c:568:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAi=0x60318, qpid=2, value=0xec236000
/users/qianyich/RecoNIC/lib/rdma_api.c:575:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAMSBi=0x603d0, qpid=2, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:579:allocate_rdma_qp(): DEBUG: qp->cq->dma_addr = 0x16ec236000, cq_addr_msb = 0x16, cq_addr_lsb = 0xec236000
/users/qianyich/RecoNIC/lib/rdma_api.c:597:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAi=0x60308, qpid=2, value=0xec238000
/users/qianyich/RecoNIC/lib/rdma_api.c:604:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAMSBi=0x603c0, qpid=2, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:608:allocate_rdma_qp(): DEBUG: qp->rq->dma_addr = 0x16ec238000, rq_addr_msb = 0x16, rq_addr_lsb = 0xec238000
/users/qianyich/RecoNIC/lib/rdma_api.c:617:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDi=0x60328, qpid=2, value=0xed000000
/users/qianyich/RecoNIC/lib/rdma_api.c:624:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDMSBi=0x6032c, qpid=2, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:625:allocate_rdma_qp(): DEBUG: cq_cidb_addr = 0x16ed000000
/users/qianyich/RecoNIC/lib/rdma_api.c:634:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDi=0x60320, qpid=2, value=0xed000020
/users/qianyich/RecoNIC/lib/rdma_api.c:641:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDMSBi=0x60324, qpid=2, value=0x16
/users/qianyich/RecoNIC/lib/rdma_api.c:642:allocate_rdma_qp(): DEBUG: rq_cidb_addr = 0x16ed000020
/users/qianyich/RecoNIC/lib/rdma_api.c:651:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_DESTQPCONFi=0x60348, qpid=2, value=0x2
/users/qianyich/RecoNIC/lib/rdma_api.c:660:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QDEPTHi=0x6033c, qpid=2, value=0x400040
/users/qianyich/RecoNIC/lib/rdma_api.c:707:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPCONFi=0x60300, qpid=2, value=0x200043d
/users/qianyich/RecoNIC/lib/rdma_api.c:721:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPADVCONFi=0x60304, qpid=2, value=0x12344000
/users/qianyich/RecoNIC/lib/rdma_api.c:730:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_PDi=0x603b0, qpid=2, value=0x0
Info: allocate_rdma_qp - Successfully allocated a rdma qp
Info: CONFIGURE PSN
[Register] RN_RDMA_QCSR_LSTRQREQi=0x60344, qpid=2, value=0xa000abc
[Register] RN_RDMA_QCSR_SQPSNi=0x60340, qpid=2, value=0xabd
Info: APPLICATION START
payload_size = 128, payload_size>>2 = 32
Info: RDMA POST RECEIVE
/users/qianyich/RecoNIC/lib/rdma_api.c:828:poll_rq_pidb(): DEBUG: Polling on RQ PIDB. Count: 0x0

Sender Side:

sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./send_recv -r 192.100.52.1 -i 192.100.51.1 -p /sys/bus/pci/devices/0000\:3b\:00.0/resource2 -z 16384 -l dev_mem -d /dev/r
econic-mm -s -u 22222 --dst_qp 2 -g 2>&1 | tee server_debug.log
src_ip_str = 192.100.52.1
dst_ip_str = 192.100.51.1
Info: mac_addr_t = 00:0a:35:58:dd:fe
Info: PCIe resource file: /sys/bus/pci/devices/0000:3b:00.0/resource2
Info: QP allocated at: dev_mem
Info: Device - /dev/reconic-mm
Info: src_ip = 192.100.52.1
Info: Found network interface: enp59s0
Info: mac_addr_t = 00:0a:35:5b:e2:96
Info: CREATE RecoNIC DEVICE
/users/qianyich/RecoNIC/lib/reconic.c:297:create_rn_dev(): Info: scr(=4)) file open successfully
create_rn_dev - testing2
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f2fa00
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f2fa00000
Info: pre-allocated hugepage buffer vir addr = 0x7f46ed000000, physical addr = 0x2f2fa00000
Info: Configuring QDMA AXI bridge BDF
/users/qianyich/RecoNIC/lib/reconic.c:195:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16420, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:196:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16424, bdf_addr_high=0x20
/users/qianyich/RecoNIC/lib/reconic.c:197:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16430, bdf_win_config=0xc4000000
Info: CREATE RDMA DEVICE
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f2fa00
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f2fa00000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f46ed000000, physical addr = 2f2fa00000, rn_dev->buffer_offset = 0x200000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f2f400
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f2f400000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f46ed200000, physical addr = 2f2f400000, rn_dev->buffer_offset = 0x1200000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f2e400
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f2e400000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f46ee200000, physical addr = 2f2e400000, rn_dev->buffer_offset = 0x1202000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f2e402
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f2e402000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f46ee202000, physical addr = 2f2e402000, rn_dev->buffer_offset = 0x1212000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:145:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f2e412
/users/qianyich/RecoNIC/lib/reconic.c:150:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:154:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f2e412000
/users/qianyich/RecoNIC/lib/reconic.c:228:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f46ee212000, physical addr = 2f2e412000, rn_dev->buffer_offset = 0x1222000
/users/qianyich/RecoNIC/lib/reconic.c:229:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: OPEN RDMA DEVICE
/users/qianyich/RecoNIC/lib/rdma_api.c:186:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBA=0x600a0, value=0x2f400000
/users/qianyich/RecoNIC/lib/rdma_api.c:188:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBAMSB=0x600a4, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:190:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFSZ=0x600a8, value=0x10001000
/users/qianyich/RecoNIC/lib/rdma_api.c:193:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBA=0x60088, value=0x2e400000
/users/qianyich/RecoNIC/lib/rdma_api.c:195:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBAMSB=0x6008c, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:197:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60090, value=0x2000
/users/qianyich/RecoNIC/lib/rdma_api.c:200:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBA=0x60060, value=0x2e402000
/users/qianyich/RecoNIC/lib/rdma_api.c:202:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBAMSB=0x60064, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:204:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60068, value=0x1000100
/users/qianyich/RecoNIC/lib/rdma_api.c:207:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBA=0x600b0, value=0x2e412000
/users/qianyich/RecoNIC/lib/rdma_api.c:209:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBAMSB=0x600b4, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:211:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZ=0x600b8, value=0x10000
/users/qianyich/RecoNIC/lib/rdma_api.c:213:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZMSB=0x600bc, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:217:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_INTEN=0x60180, value=0xff
/users/qianyich/RecoNIC/lib/rdma_api.c:221:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDLSB=0x60010, value=0x355be296
/users/qianyich/RecoNIC/lib/rdma_api.c:223:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDMSB=0x60014, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:227:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPV4XADD=0x60070, value=0xc0643401
/users/qianyich/RecoNIC/lib/rdma_api.c:230:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICCONF=0x60000, value=0x56ce1421
/users/qianyich/RecoNIC/lib/rdma_api.c:233:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICADCONF=0x60004, value=0xa0004
Info: RDMA global control status registers are configured.
Info: rdma_dev opened
Info: ALLOCATE PD
/users/qianyich/RecoNIC/lib/rdma_api.c:254:allocate_rdma_pd(): [Register] RN_RDMA_PDT_PDPDNUM=0x40000, pd_num=0, value=0x0
Info: OPEN DEVICE FILE
Info: ALLOCATE RDMA QP
Allocating qp->sq
/users/qianyich/RecoNIC/lib/rdma_api.c:437:allocate_rdma_qp(): sq_size = 81920, cq_size = 5120, rq_size 655360, buf_location = dev_mem
/users/qianyich/RecoNIC/lib/reconic.c:252:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000000000, rn_dev->dev_buffer_offset = 0x14000
/users/qianyich/RecoNIC/lib/reconic.c:254:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Allocating qp->cq
/users/qianyich/RecoNIC/lib/reconic.c:252:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000014000, rn_dev->dev_buffer_offset = 0x15400
/users/qianyich/RecoNIC/lib/reconic.c:254:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Allocating qp->rq
/users/qianyich/RecoNIC/lib/reconic.c:252:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000016000, rn_dev->dev_buffer_offset = 0xb6000
/users/qianyich/RecoNIC/lib/reconic.c:254:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Info: queue pair setting is done! Configuring RDMA per-queu CSR registers
/users/qianyich/RecoNIC/lib/rdma_api.c:487:allocate_rdma_qp(): DEBUG: rdma_dev->rn_dev->axil_ctl = 0x7f470d197000, rdma_dev->axil_ctl = 0x7f470d197000
/users/qianyich/RecoNIC/lib/rdma_api.c:502:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_IPDESADDR1i=0x60360, qpid=2, value=0xc0643301
/users/qianyich/RecoNIC/lib/rdma_api.c:509:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDLSBi=0x60350, qpid=2, value=0x3558ddfe
/users/qianyich/RecoNIC/lib/rdma_api.c:516:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDMSBi=0x60354, qpid=2, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:521:allocate_rdma_qp(): DEBUG: win_size_high = 0x1f, win_size_low = 0xffffffff
/users/qianyich/RecoNIC/lib/rdma_api.c:539:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAi=0x60310, qpid=2, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:546:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAMSBi=0x603c8, qpid=2, value=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:550:allocate_rdma_qp(): DEBUG: qp->sq->dma_addr = 0xa350000000000000, sq_addr_msb = 0xa3500000, sq_addr_lsb = 0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:568:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAi=0x60318, qpid=2, value=0x14000
/users/qianyich/RecoNIC/lib/rdma_api.c:575:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAMSBi=0x603d0, qpid=2, value=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:579:allocate_rdma_qp(): DEBUG: qp->cq->dma_addr = 0xa350000000014000, cq_addr_msb = 0xa3500000, cq_addr_lsb = 0x14000
/users/qianyich/RecoNIC/lib/rdma_api.c:597:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAi=0x60308, qpid=2, value=0x16000
/users/qianyich/RecoNIC/lib/rdma_api.c:604:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAMSBi=0x603c0, qpid=2, value=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:608:allocate_rdma_qp(): DEBUG: qp->rq->dma_addr = 0xa350000000016000, rq_addr_msb = 0xa3500000, rq_addr_lsb = 0x16000
/users/qianyich/RecoNIC/lib/rdma_api.c:617:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDi=0x60328, qpid=2, value=0x2fa00000
/users/qianyich/RecoNIC/lib/rdma_api.c:624:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDMSBi=0x6032c, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:625:allocate_rdma_qp(): DEBUG: cq_cidb_addr = 0x2f2fa00000
/users/qianyich/RecoNIC/lib/rdma_api.c:634:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDi=0x60320, qpid=2, value=0x2fa00020
/users/qianyich/RecoNIC/lib/rdma_api.c:641:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDMSBi=0x60324, qpid=2, value=0xf
/users/qianyich/RecoNIC/lib/rdma_api.c:642:allocate_rdma_qp(): DEBUG: rq_cidb_addr = 0x2f2fa00020
/users/qianyich/RecoNIC/lib/rdma_api.c:651:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_DESTQPCONFi=0x60348, qpid=2, value=0x2
/users/qianyich/RecoNIC/lib/rdma_api.c:660:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QDEPTHi=0x6033c, qpid=2, value=0x400040
/users/qianyich/RecoNIC/lib/rdma_api.c:707:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPCONFi=0x60300, qpid=2, value=0x200043d
/users/qianyich/RecoNIC/lib/rdma_api.c:721:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPADVCONFi=0x60304, qpid=2, value=0x12344000
/users/qianyich/RecoNIC/lib/rdma_api.c:730:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_PDi=0x603b0, qpid=2, value=0x0
Info: allocate_rdma_qp - Successfully allocated a rdma qp
Info: CONFIGURE PSN
[Register] RN_RDMA_QCSR_LSTRQREQi=0x60344, qpid=2, value=0xa000abc
[Register] RN_RDMA_QCSR_SQPSNi=0x60340, qpid=2, value=0xabd
Info: APPLICATION START
payload_size = 16384, payload_size>>2 = 4096
Info: ALLOCATE PAYLOAD DATA
/users/qianyich/RecoNIC/lib/reconic.c:252:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a3500000000b6000, rn_dev->dev_buffer_offset = 0xba000
/users/qianyich/RecoNIC/lib/reconic.c:254:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Info: copy software payload to the device memory
/users/qianyich/RecoNIC/lib/rdma_api.c:769:create_a_wqe(): Info: WQE mem_buffer = 0xa3500000000b6000, masked_mem_buffer = 0xa3500000000b6000
/users/qianyich/RecoNIC/lib/rdma_api.c:796:create_a_wqe(): [WQE] wrid=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:797:create_a_wqe(): [WQE] laddr_low=0xb6000
/users/qianyich/RecoNIC/lib/rdma_api.c:798:create_a_wqe(): [WQE] laddr_high=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:799:create_a_wqe(): [WQE] length=0x4000
/users/qianyich/RecoNIC/lib/rdma_api.c:800:create_a_wqe(): [WQE] opcode=0x2
/users/qianyich/RecoNIC/lib/rdma_api.c:801:create_a_wqe(): [WQE] remote_offset_low=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:802:create_a_wqe(): [WQE] remote_offset_high=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:803:create_a_wqe(): [WQE] r_key=0x8
/users/qianyich/RecoNIC/lib/rdma_api.c:804:create_a_wqe(): [WQE] send_small_payload0=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:805:create_a_wqe(): [WQE] send_small_payload1=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:806:create_a_wqe(): [WQE] send_small_payload2=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:807:create_a_wqe(): [WQE] send_small_payload3=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:808:create_a_wqe(): [WQE] immdt_data=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:811:create_a_wqe(): DEBUG: Write WQE to the device memory
/users/qianyich/RecoNIC/lib/rdma_api.c:817:create_a_wqe(): DEBUG: successfully write WQE to the device memory!
Info: Dump register values for debug purpose
Info: [RN_RDMA_GCSR_ERRBUFWPTR      = 0x6006c] = 0x0
Info: [RN_RDMA_GCSR_IPKTERRQWPTR    = 0x60094] = 0x0
Info: [RN_RDMA_GCSR_INSRRPKTCNT     = 0x60100] = 0x0
Info: [RN_RDMA_GCSR_INAMPKTCNT      = 0x60104] = 0x0
Info: [RN_RDMA_GCSR_OUTIOPKTCNT     = 0x60108] = 0x0
Info: [RN_RDMA_GCSR_OUTAMPKTCNT     = 0x6010c] = 0x1
Info: [RN_RDMA_GCSR_LSTINPKT        = 0x60110] = 0xabd020a
Info: [RN_RDMA_GCSR_LSTOUTPKT       = 0x60114] = 0x0
Info: [RN_RDMA_GCSR_ININVDUPCNT     = 0x60118] = 0x0
Info: [RN_RDMA_GCSR_INNCKPKTSTS     = 0x6011c] = 0x0
Info: [RN_RDMA_GCSR_OUTRNRPKTSTS    = 0x60120] = 0x0
Info: [RN_RDMA_GCSR_WQEPROCSTS      = 0x60124] = 0x1012040
Info: [RN_RDMA_GCSR_QPMSTS          = 0x6012c] = 0x2
Info: [RN_RDMA_GCSR_INALLDRPPKTCNT  = 0x60130] = 0x10000
Info: [RN_RDMA_GCSR_INNAKPKTCNT     = 0x60134] = 0x0
Info: [RN_RDMA_GCSR_OUTNAKPKTCNT    = 0x60138] = 0x0
Info: [RN_RDMA_GCSR_RESPHNDSTS      = 0x6013c] = 0x10000
Info: [RN_RDMA_GCSR_RETRYCNTSTS     = 0x60140] = 0x0
Info: [RN_RDMA_GCSR_INCNPPKTCNT     = 0x60174] = 0x0
Info: [RN_RDMA_GCSR_OUTCNPPKTCNT    = 0x60178] = 0x0
Info: [RN_RDMA_GCSR_OUTRDRSPPKTCNT  = 0x6017c] = 0x0
Info: [RN_RDMA_GCSR_INTSTS          = 0x60184] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS1       = 0x60190] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS2       = 0x60194] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS3       = 0x60198] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS4       = 0x6019c] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS5       = 0x601a0] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS6       = 0x601a4] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS7       = 0x601a8] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS8       = 0x601ac] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS1       = 0x601b0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS2       = 0x601b4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS3       = 0x601b8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS4       = 0x601bc] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS5       = 0x601c0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS6       = 0x601c4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS7       = 0x601c8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS8       = 0x601cc] = 0x0
Info: [RN_RDMA_QCSR_CQHEADi         = 0x60330] = 0x0
Info: [RN_RDMA_QCSR_STATSSNi        = 0x60380] = 0x1
Info: [RN_RDMA_QCSR_STATMSNi        = 0x60384] = 0x1
Info: [RN_RDMA_QCSR_STATQPi         = 0x60388] = 0x600
Info: [RN_RDMA_QCSR_STATCURSQPTRi   = 0x6038c] = 0x1
Info: [RN_RDMA_QCSR_STATRESPSNi     = 0x60390] = 0xabc
Info: [RN_RDMA_QCSR_STATRQBUFCAi    = 0x60394] = 0x16000
Info: [RN_RDMA_QCSR_STATWQEi        = 0x60398] = 0x0
Info: [RN_RDMA_QCSR_STATRQPIDBi     = 0x6039c] = 0x0
Info: [RN_RDMA_QCSR_STATRQBUFCAMSBi = 0x603d8] = 0xa3500000
Info: [RN_RDMA_QCSR_SQPIi           = 0x60338] = 0x1

/users/qianyich/RecoNIC/examples/rdma_test/send_recv.c:393:main(): Info: POSTING WQE of RDMA SEND
/users/qianyich/RecoNIC/lib/rdma_api.c:875:rdma_post_send(): DEBUG: Reading hardware SQPIi (0x60338) = 0x1
/users/qianyich/RecoNIC/lib/rdma_api.c:876:rdma_post_send(): DEBUG: original qp->sq_pidb = 0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:882:rdma_post_send(): [Register] RN_RDMA_QCSR_SQPIi=0x60338, qpid=2, value=0x1
/users/qianyich/RecoNIC/lib/rdma_api.c:883:rdma_post_send(): DEBUG: Update hardware sq db idx from software = 1
/users/qianyich/RecoNIC/lib/rdma_api.c:884:rdma_post_send(): DEBUG: Reading hardware SQPIi (0x60338) = 0x1
/users/qianyich/RecoNIC/lib/rdma_api.c:844:poll_cq_cidb(): [Register] RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:846:poll_cq_cidb(): DEBUG: before polling: sq_cidb = 0; Polling CQ CIDB = 0
ERROR: poll_cq_cidb timeout! sq_cidb = 0; Polling CQ CIDB = 0
Info: Dump register values for debug purpose
Info: [RN_RDMA_GCSR_ERRBUFWPTR      = 0x6006c] = 0x0
Info: [RN_RDMA_GCSR_IPKTERRQWPTR    = 0x60094] = 0x0
Info: [RN_RDMA_GCSR_INSRRPKTCNT     = 0x60100] = 0x0
Info: [RN_RDMA_GCSR_INAMPKTCNT      = 0x60104] = 0x0
Info: [RN_RDMA_GCSR_OUTIOPKTCNT     = 0x60108] = 0x0
Info: [RN_RDMA_GCSR_OUTAMPKTCNT     = 0x6010c] = 0x1
Info: [RN_RDMA_GCSR_LSTINPKT        = 0x60110] = 0xabd020a
Info: [RN_RDMA_GCSR_LSTOUTPKT       = 0x60114] = 0x0
Info: [RN_RDMA_GCSR_ININVDUPCNT     = 0x60118] = 0x0
Info: [RN_RDMA_GCSR_INNCKPKTSTS     = 0x6011c] = 0x0
Info: [RN_RDMA_GCSR_OUTRNRPKTSTS    = 0x60120] = 0x0
Info: [RN_RDMA_GCSR_WQEPROCSTS      = 0x60124] = 0x1012040
Info: [RN_RDMA_GCSR_QPMSTS          = 0x6012c] = 0x2
Info: [RN_RDMA_GCSR_INALLDRPPKTCNT  = 0x60130] = 0x10000
Info: [RN_RDMA_GCSR_INNAKPKTCNT     = 0x60134] = 0x0
Info: [RN_RDMA_GCSR_OUTNAKPKTCNT    = 0x60138] = 0x0
Info: [RN_RDMA_GCSR_RESPHNDSTS      = 0x6013c] = 0x10000
Info: [RN_RDMA_GCSR_RETRYCNTSTS     = 0x60140] = 0x0
Info: [RN_RDMA_GCSR_INCNPPKTCNT     = 0x60174] = 0x0
Info: [RN_RDMA_GCSR_OUTCNPPKTCNT    = 0x60178] = 0x0
Info: [RN_RDMA_GCSR_OUTRDRSPPKTCNT  = 0x6017c] = 0x0
Info: [RN_RDMA_GCSR_INTSTS          = 0x60184] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS1       = 0x60190] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS2       = 0x60194] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS3       = 0x60198] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS4       = 0x6019c] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS5       = 0x601a0] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS6       = 0x601a4] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS7       = 0x601a8] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS8       = 0x601ac] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS1       = 0x601b0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS2       = 0x601b4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS3       = 0x601b8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS4       = 0x601bc] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS5       = 0x601c0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS6       = 0x601c4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS7       = 0x601c8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS8       = 0x601cc] = 0x0
Info: [RN_RDMA_QCSR_CQHEADi         = 0x60330] = 0x0
Info: [RN_RDMA_QCSR_STATSSNi        = 0x60380] = 0x1
Info: [RN_RDMA_QCSR_STATMSNi        = 0x60384] = 0x1
Info: [RN_RDMA_QCSR_STATQPi         = 0x60388] = 0x600
Info: [RN_RDMA_QCSR_STATCURSQPTRi   = 0x6038c] = 0x1
Info: [RN_RDMA_QCSR_STATRESPSNi     = 0x60390] = 0xabc
Info: [RN_RDMA_QCSR_STATRQBUFCAi    = 0x60394] = 0x16000
Info: [RN_RDMA_QCSR_STATWQEi        = 0x60398] = 0x0
Info: [RN_RDMA_QCSR_STATRQPIDBi     = 0x6039c] = 0x0
Info: [RN_RDMA_QCSR_STATRQBUFCAMSBi = 0x603d8] = 0xa3500000
Info: [RN_RDMA_QCSR_SQPIi           = 0x60338] = 0x1

Error: Failed to perform an RDMA send operation!
Warning: CQHEADi and SQPIi for QP2 are mismatched

***** QP2 FATAL RECOVERY *****
TIMEOUT: CQHEADi:0x0 and SQPIi:0x1 are different
zhguanw-amd commented 7 months ago

Hi Yicheng,

I'm not surprised that it fails, as the BDF table configuration is still wrong at your end. We can't simply set the window size to 256GB, the max is 128GB per window. Please wait for a few days and I'll push one next week to cover 8 windows up to 1TB system memory mapping.

qianyich commented 7 months ago

Oh yeah, I forgot about it. That makes sense. Thank you!

qianyich commented 7 months ago

Any progress on this? Or maybe you can point me to the place I need to modify? Should I just modify AXI_BAR_SIZE defined in reconic_reg.h so that get_win_size()in reconic.c can set win_size to 1 TB?

Out of curiosity, this BAR 0 is for the device to access host memory, what are the BAR0 (DMA) and BAR2 (AXI Lite Master) in QDMA configuration for? Also shown in lspci output.

 sudo lspci -vvv -s 3b:00.0
3b:00.0 Memory controller: Xilinx Corporation Device 903f
        Subsystem: Xilinx Corporation Device 0007
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 229
        NUMA node: 0
        Region 0: Memory at ab400000 (64-bit, non-prefetchable) [size=256K]
        Region 2: Memory at ab000000 (64-bit, non-prefetchable) [size=4M]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [60] MSI-X: Enable+ Count=8 Masked-
                Vector table: BAR=0 offset=00030000
                PBA: BAR=0 offset=00034000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
                DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency L0s unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [1c0 v1] #19
        Capabilities: [1f0 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                Port Arbitration Table [500] <?>
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Kernel driver in use: onic
        Kernel modules: qdma_pf
qianyich commented 7 months ago

This could be a stupid question, but I noticed that the transport protocol in RoCE v2 spec is Infiniband, is the Infiniband header used in ERNIC or RecoNIC? I am asking because I did not find Infiniband in the ERNIC document.

I kind of know RoCE v2 uses UDP/IP and there shouldn't be any Infiniband things involved. I am just curious about what is the IB transport protocol on top of the network layer.

image

zhguanw-amd commented 7 months ago

@qianyich Please try branch "map_large_hmem". Let me know if it works or not. I'll merge it to the main branch after your test.

The "Infiniband" you mentioned is link layer, which compared to "ethernet". RoCEv2 means RDMA protocol on top of ethernet. The RDMA protocol here is IB transport protocol in your figure, which includes base transport header (BTH), extended transport header (RETH, AETH, ImmDt, IETH, AtomicETH, AtomicAckETH, ...), etc.

qianyich commented 7 months ago

I reprogrammed the devices with this updated library, and started from the read test.

TL;DR I found that systolic array test failed and then ruined all other test. I observed segfaults from dmesg. And its failure is most likely about the segfaults in the library code which make the system trap into a bad state. The segfaults also happened when running read, write, send_recv tests. Please check what I have done and their results below!

What I did in order are (1) dma test (2) read test (3) write test (4) send_recv test (5) systolic array test (6) dma test (7) read test.

:white_check_mark: (1) dma test passed with no error :white_check_mark: (2) read test passed with logs attached here: client_debug_rdma_read.log server_debug_rdma_read.log

:white_check_mark: (3) write test passed with logs attached here: client_debug_rdma_write.log server_debug_rdma_write.log

:white_check_mark: (4) send_recv test passed with logs attached here: client_debug_receiver.log server_debug_sender.log

❌ (5) systolic array test failed with logs attached here (for this 192.100.52.1 is the client who requests matrices): server_debug_systolic.log client_debug_systolic.log

This failed and I observed errors from dmesg not only for systolic array test but also for the previous read, write, send_recv tests as shown below:

192.100.51.1:

[  159.273862] IPv6: ADDRCONF(NETDEV_UP): enp59s0: link is not ready
[  159.273888] IPv6: ADDRCONF(NETDEV_CHANGE): enp59s0: link becomes ready
[  240.930586] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  275.085286] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  427.603894] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  427.603918] read[2476]: segfault at 29 ip 00007ffa0f6ed8f9 sp 00007ffc518e0d30 error 4 in libreconic.so[7ffa0f6e8000+c000]
[  632.051171] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  632.051203] write[2496]: segfault at 29 ip 00007fd4508358f9 sp 00007ffeac2ba5f0 error 4 in libreconic.so[7fd450830000+c000]
[  795.233063] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  795.233083] send_recv[2512]: segfault at 29 ip 00007f677f3f88f9 sp 00007ffc7484ec00 error 4 in libreconic.so[7f677f3f3000+c000]
[ 1106.850350] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[ 1106.850376] network_systoli[2555]: segfault at 29 ip 00007f0dc6bde8f9 sp 00007ffd5c084bd0 error 4 in libreconic.so[7f0dc6bd9000+c000]
[ 1222.657479] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[ 1228.868558] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.

192.100.52.1:

[   75.281512] IPv6: ADDRCONF(NETDEV_UP): enp59s0: link is not ready
[   75.281520] IPv6: ADDRCONF(NETDEV_CHANGE): enp59s0: link becomes ready
[  103.193176] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  111.940145] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  695.196025] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  695.196050] read[2444]: segfault at 29 ip 00007f74f74758f9 sp 00007ffd6ae2c4d0 error 4 in libreconic.so[7f74f7470000+c000]
[  811.056709] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  811.056739] write[2523]: segfault at 29 ip 00007f282d1b18f9 sp 00007fff51091c90 error 4 in libreconic.so[7f282d1ac000+c000]
[ 1075.081479] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[ 1075.081503] send_recv[2589]: segfault at 29 ip 00007fabcbb348f9 sp 00007ffe19910800 error 4 in libreconic.so[7fabcbb2f000+c000]
[ 1249.158570] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.

❌ (6) dma test again. This time, dma test passed on 192.100.51.1, but partially failed on 192.100.52.1!

192.100.51.1:

qianyich@pc166:~/RecoNIC/examples/dma_test$ sudo ./dma_test -d /dev/reconic-mm -s 65536000 -c 200
Write scenario
size=65536000 Average BW = 4.875006 GB/sec, average latency = 13443.265110 us
qianyich@pc166:~/RecoNIC/examples/dma_test$ sudo ./dma_test -d /dev/reconic-mm -s 65536000 -c 200 -r
Read scenario
size=65536000 Average BW = 5.177333 GB/sec, average latency = 12658.255175 us

192.100.52.1:

qianyich@pc167:~/RecoNIC/examples/dma_test$ sudo ./dma_test -d /dev/reconic-mm -s 65536000 -c 200 -r
Read scenario
size=65536000 Average BW = 4.914677 GB/sec, average latency = 13334.752992 us
qianyich@pc167:~/RecoNIC/examples/dma_test$ sudo ./dma_test -d /dev/reconic-mm -s 65536000 -c 200
Write scenario
/dev/reconic-mm, W off 0x0, 0x3e80000 failed -1.
write file: Input/output error
[ 1426.166054] onic:qdma_request_wait_for_cmpl: qdma3b000-MM-67: req 0x00000000581aaf3c, W,65536000,0/65536000,0x0, done 0, err 0, tm 10000.
[ 1426.178392] onic:qdma_descq_dump: qdma3b000-MM-67: 0x43/0x43, desc sz 1024/895, pidx 896, cidx 768
[ 1426.178805] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.

❌ (7) After all of these, I tried to rerun rdma read test, I got the following errors:

192.100.51.1:

sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./read -r 192.100.51.1 -i 192.100.52.1 -p /sys/bus/pci/devices/0000\:3b\:00.0/resource2 -
z 128 -l host_mem -d /dev/reconic-mm -c -u 22222 -t 11111 --dst_qp 2 -g 2>&1 | tee client_debug.log
src_ip_str = 192.100.51.1
dst_ip_str = 192.100.52.1
Info: mac_addr_t = 00:0a:35:d3:a6:64
Info: PCIe resource file: /sys/bus/pci/devices/0000:3b:00.0/resource2
Info: QP allocated at: host_mem
Info: Device - /dev/reconic-mm
Info: src_ip = 192.100.51.1
Info: Found network interface: enp59s0
Info: mac_addr_t = 00:0a:35:02:e0:fc
Info: Creating rn_dev
/users/qianyich/RecoNIC/lib/reconic.c:301:create_rn_dev(): Info: scr(=4)) file open successfully
create_rn_dev - testing2
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f24e00
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f24e00000
Info: pre-allocated hugepage buffer vir addr = 0x7f17c2000000, physical addr = 0x2f24e00000
Info: Configuring 8 windows in QDMA AXI bridge BDF, each has 128GB mapping
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16420, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16424, bdf_addr_high=0x0
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16430, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16440, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16444, bdf_addr_high=0x20
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16450, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16460, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16464, bdf_addr_high=0x40
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16470, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16480, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16484, bdf_addr_high=0x60
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16490, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x164a0, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x164a4, bdf_addr_high=0x80
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x164b0, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x164c0, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x164c4, bdf_addr_high=0xa0
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x164d0, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x164e0, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x164e4, bdf_addr_high=0xc0
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x164f0, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16500, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16504, bdf_addr_high=0xe0
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16510, bdf_win_config=0xc2000000
Info: CREATE RDMA DEVICE
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f24e00
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f24e00000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f17c2000000, physical addr = 2f24e00000, rn_dev->buffer_offset = 0x200000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f24c00
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f24c00000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f17c2200000, physical addr = 2f24c00000, rn_dev->buffer_offset = 0x1200000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f25c00
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f25c00000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f17c3200000, physical addr = 2f25c00000, rn_dev->buffer_offset = 0x1202000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f25c02
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f25c02000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f17c3202000, physical addr = 2f25c02000, rn_dev->buffer_offset = 0x1212000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f25c12
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f25c12000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f17c3212000, physical addr = 2f25c12000, rn_dev->buffer_offset = 0x1222000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: OPEN RDMA DEVICE
/users/qianyich/RecoNIC/lib/rdma_api.c:186:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBA=0x600a0, value=0x24c00000
/users/qianyich/RecoNIC/lib/rdma_api.c:188:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBAMSB=0x600a4, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:190:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFSZ=0x600a8, value=0x10001000
/users/qianyich/RecoNIC/lib/rdma_api.c:193:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBA=0x60088, value=0x25c00000
/users/qianyich/RecoNIC/lib/rdma_api.c:195:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBAMSB=0x6008c, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:197:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60090, value=0x2000
/users/qianyich/RecoNIC/lib/rdma_api.c:200:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBA=0x60060, value=0x25c02000
/users/qianyich/RecoNIC/lib/rdma_api.c:202:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBAMSB=0x60064, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:204:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60068, value=0x1000100
/users/qianyich/RecoNIC/lib/rdma_api.c:207:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBA=0x600b0, value=0x25c12000
/users/qianyich/RecoNIC/lib/rdma_api.c:209:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBAMSB=0x600b4, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:211:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZ=0x600b8, value=0x10000
/users/qianyich/RecoNIC/lib/rdma_api.c:213:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZMSB=0x600bc, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:217:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_INTEN=0x60180, value=0xff
/users/qianyich/RecoNIC/lib/rdma_api.c:221:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDLSB=0x60010, value=0x3502e0fc
/users/qianyich/RecoNIC/lib/rdma_api.c:223:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDMSB=0x60014, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:227:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPV4XADD=0x60070, value=0xc0643301
/users/qianyich/RecoNIC/lib/rdma_api.c:230:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICCONF=0x60000, value=0x56ce1421
/users/qianyich/RecoNIC/lib/rdma_api.c:233:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICADCONF=0x60004, value=0xa0004
Info: RDMA global control status registers are configured.
Info: rdma_dev opened
Info: ALLOCATE PD
/users/qianyich/RecoNIC/lib/rdma_api.c:254:allocate_rdma_pd(): [Register] RN_RDMA_PDT_PDPDNUM=0x40000, pd_num=0, value=0x0
Info: OPEN DEVICE FILE
Info: ALLOCATE RDMA QP
Allocating qp->sq
/users/qianyich/RecoNIC/lib/rdma_api.c:437:allocate_rdma_qp(): sq_size = 81920, cq_size = 5120, rq_size 655360, buf_location = host_mem
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f25c22
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f25c22000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f17c3222000, physical addr = 2f25c22000, rn_dev->buffer_offset = 0x1236000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Allocating qp->cq
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f25c36
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f25c36000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f17c3236000, physical addr = 2f25c36000, rn_dev->buffer_offset = 0x1237400
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Allocating qp->rq
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x2f25c38
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x2f25c38000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f17c3238000, physical addr = 2f25c38000, rn_dev->buffer_offset = 0x12d8000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: queue pair setting is done! Configuring RDMA per-queu CSR registers
/users/qianyich/RecoNIC/lib/rdma_api.c:487:allocate_rdma_qp(): DEBUG: rdma_dev->rn_dev->axil_ctl = 0x7f17e208c000, rdma_dev->axil_ctl = 0x7f17e208c000
/users/qianyich/RecoNIC/lib/rdma_api.c:502:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_IPDESADDR1i=0x60360, qpid=2, value=0xc0643401
/users/qianyich/RecoNIC/lib/rdma_api.c:509:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDLSBi=0x60350, qpid=2, value=0x35d3a664
/users/qianyich/RecoNIC/lib/rdma_api.c:516:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDMSBi=0x60354, qpid=2, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:521:allocate_rdma_qp(): DEBUG: win_size_high = 0xff, win_size_low = 0xffffffff
/users/qianyich/RecoNIC/lib/rdma_api.c:539:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAi=0x60310, qpid=2, value=0x25c22000
/users/qianyich/RecoNIC/lib/rdma_api.c:546:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAMSBi=0x603c8, qpid=2, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:550:allocate_rdma_qp(): DEBUG: qp->sq->dma_addr = 0x2f25c22000, sq_addr_msb = 0x2f, sq_addr_lsb = 0x25c22000
/users/qianyich/RecoNIC/lib/rdma_api.c:568:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAi=0x60318, qpid=2, value=0x25c36000
/users/qianyich/RecoNIC/lib/rdma_api.c:575:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAMSBi=0x603d0, qpid=2, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:579:allocate_rdma_qp(): DEBUG: qp->cq->dma_addr = 0x2f25c36000, cq_addr_msb = 0x2f, cq_addr_lsb = 0x25c36000
/users/qianyich/RecoNIC/lib/rdma_api.c:597:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAi=0x60308, qpid=2, value=0x25c38000
/users/qianyich/RecoNIC/lib/rdma_api.c:604:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAMSBi=0x603c0, qpid=2, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:608:allocate_rdma_qp(): DEBUG: qp->rq->dma_addr = 0x2f25c38000, rq_addr_msb = 0x2f, rq_addr_lsb = 0x25c38000
/users/qianyich/RecoNIC/lib/rdma_api.c:617:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDi=0x60328, qpid=2, value=0x24e00000
/users/qianyich/RecoNIC/lib/rdma_api.c:624:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDMSBi=0x6032c, qpid=2, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:625:allocate_rdma_qp(): DEBUG: cq_cidb_addr = 0x2f24e00000
/users/qianyich/RecoNIC/lib/rdma_api.c:634:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDi=0x60320, qpid=2, value=0x24e00020
/users/qianyich/RecoNIC/lib/rdma_api.c:641:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDMSBi=0x60324, qpid=2, value=0x2f
/users/qianyich/RecoNIC/lib/rdma_api.c:642:allocate_rdma_qp(): DEBUG: rq_cidb_addr = 0x2f24e00020
/users/qianyich/RecoNIC/lib/rdma_api.c:651:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_DESTQPCONFi=0x60348, qpid=2, value=0x2
/users/qianyich/RecoNIC/lib/rdma_api.c:660:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QDEPTHi=0x6033c, qpid=2, value=0x400040
/users/qianyich/RecoNIC/lib/rdma_api.c:707:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPCONFi=0x60300, qpid=2, value=0x200043d
/users/qianyich/RecoNIC/lib/rdma_api.c:721:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPADVCONFi=0x60304, qpid=2, value=0x12344000
/users/qianyich/RecoNIC/lib/rdma_api.c:730:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_PDi=0x603b0, qpid=2, value=0x0
Info: allocate_rdma_qp - Successfully allocated a rdma qp
Info: CONFIGURE PSN
[Register] RN_RDMA_QCSR_LSTRQREQi=0x60344, qpid=2, value=0xa000abc
[Register] RN_RDMA_QCSR_SQPSNi=0x60340, qpid=2, value=0xabd
payload_size = 128, payload_size>>2 = 32
Info: Client is connecting to a remote server
Info: Client is connected to a remote server
Error: Can't receive remote offset of A from the remote peer

192.100.52.1:

sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./read -r 192.100.52.1 -i 192.100.51.1 -p /sys/bus/pci/devices/0000\:3b\:00.0/resource2 -
z 128 -l host_mem -d /dev/reconic-mm -s -u 22222 -t 11111 --dst_qp 2 -g 2>&1 | tee server_debug.log
src_ip_str = 192.100.52.1
dst_ip_str = 192.100.51.1
Info: mac_addr_t = 00:0a:35:02:e0:fc
Info: PCIe resource file: /sys/bus/pci/devices/0000:3b:00.0/resource2
Info: QP allocated at: host_mem
Info: Device - /dev/reconic-mm
Info: src_ip = 192.100.52.1
Info: Found network interface: enp59s0
Info: mac_addr_t = 00:0a:35:d3:a6:64
Info: Creating rn_dev
/users/qianyich/RecoNIC/lib/reconic.c:301:create_rn_dev(): Info: scr(=4)) file open successfully
create_rn_dev - testing2
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x1711000
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x1711000000
Info: pre-allocated hugepage buffer vir addr = 0x7f2962200000, physical addr = 0x1711000000
Info: Configuring 8 windows in QDMA AXI bridge BDF, each has 128GB mapping
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16420, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16424, bdf_addr_high=0x0
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16430, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16440, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16444, bdf_addr_high=0x20
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16450, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16460, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16464, bdf_addr_high=0x40
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16470, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16480, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16484, bdf_addr_high=0x60
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16490, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x164a0, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x164a4, bdf_addr_high=0x80
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x164b0, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x164c0, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x164c4, bdf_addr_high=0xa0
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x164d0, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x164e0, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x164e4, bdf_addr_high=0xc0
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x164f0, bdf_win_config=0xc2000000
/users/qianyich/RecoNIC/lib/reconic.c:198:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_LSB=0x16500, bdf_addr_low=0x0
/users/qianyich/RecoNIC/lib/reconic.c:199:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_ADDR_TRANSLATE_ADDR_MSB=0x16504, bdf_addr_high=0xe0
/users/qianyich/RecoNIC/lib/reconic.c:200:config_rn_dev_axib_bdf(): [BDF] AXIB_BDF_MAP_CONTROL_ADDR=0x16510, bdf_win_config=0xc2000000
Info: CREATE RDMA DEVICE
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x1711000
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x1711000000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f2962200000, physical addr = 1711000000, rn_dev->buffer_offset = 0x200000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x1711600
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x1711600000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f2962400000, physical addr = 1711600000, rn_dev->buffer_offset = 0x1200000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x1712600
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x1712600000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f2963400000, physical addr = 1712600000, rn_dev->buffer_offset = 0x1202000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x1712602
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x1712602000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f2963402000, physical addr = 1712602000, rn_dev->buffer_offset = 0x1212000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x1712612
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x1712612000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f2963412000, physical addr = 1712612000, rn_dev->buffer_offset = 0x1222000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: OPEN RDMA DEVICE
/users/qianyich/RecoNIC/lib/rdma_api.c:186:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBA=0x600a0, value=0x11600000
/users/qianyich/RecoNIC/lib/rdma_api.c:188:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFBAMSB=0x600a4, value=0x17
/users/qianyich/RecoNIC/lib/rdma_api.c:190:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_DATBUFSZ=0x600a8, value=0x10001000
/users/qianyich/RecoNIC/lib/rdma_api.c:193:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBA=0x60088, value=0x12600000
/users/qianyich/RecoNIC/lib/rdma_api.c:195:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPKTERRQBAMSB=0x6008c, value=0x17
/users/qianyich/RecoNIC/lib/rdma_api.c:197:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60090, value=0x2000
/users/qianyich/RecoNIC/lib/rdma_api.c:200:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBA=0x60060, value=0x12602000
/users/qianyich/RecoNIC/lib/rdma_api.c:202:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFBAMSB=0x60064, value=0x17
/users/qianyich/RecoNIC/lib/rdma_api.c:204:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_ERRBUFSZ=0x60068, value=0x1000100
/users/qianyich/RecoNIC/lib/rdma_api.c:207:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBA=0x600b0, value=0x12612000
/users/qianyich/RecoNIC/lib/rdma_api.c:209:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRPKTBAMSB=0x600b4, value=0x17
/users/qianyich/RecoNIC/lib/rdma_api.c:211:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZ=0x600b8, value=0x10000
/users/qianyich/RecoNIC/lib/rdma_api.c:213:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_RESPERRSZMSB=0x600bc, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:217:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_INTEN=0x60180, value=0xff
/users/qianyich/RecoNIC/lib/rdma_api.c:221:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDLSB=0x60010, value=0x35d3a664
/users/qianyich/RecoNIC/lib/rdma_api.c:223:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_MACXADDMSB=0x60014, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:227:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_IPV4XADD=0x60070, value=0xc0643401
/users/qianyich/RecoNIC/lib/rdma_api.c:230:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICCONF=0x60000, value=0x56ce1421
/users/qianyich/RecoNIC/lib/rdma_api.c:233:config_rdma_global_csr(): [Register] RN_RDMA_GCSR_XRNICADCONF=0x60004, value=0xa0004
Info: RDMA global control status registers are configured.
Info: rdma_dev opened
Info: ALLOCATE PD
/users/qianyich/RecoNIC/lib/rdma_api.c:254:allocate_rdma_pd(): [Register] RN_RDMA_PDT_PDPDNUM=0x40000, pd_num=0, value=0x0
Info: OPEN DEVICE FILE
Info: ALLOCATE RDMA QP
Allocating qp->sq
/users/qianyich/RecoNIC/lib/rdma_api.c:437:allocate_rdma_qp(): sq_size = 81920, cq_size = 5120, rq_size 655360, buf_location = host_mem
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x1712622
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x1712622000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f2963422000, physical addr = 1712622000, rn_dev->buffer_offset = 0x1236000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Allocating qp->cq
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x1712636
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x1712636000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f2963436000, physical addr = 1712636000, rn_dev->buffer_offset = 0x1237400
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Allocating qp->rq
/users/qianyich/RecoNIC/lib/reconic.c:146:get_buffer_paddr(): Info: get_buffer_paddr - Page frame: 0x1712638
/users/qianyich/RecoNIC/lib/reconic.c:151:get_buffer_paddr(): Info: get_buffer_paddr - distance from page boundary: 0x0
/users/qianyich/RecoNIC/lib/reconic.c:155:get_buffer_paddr(): Info: get_buffer_paddr - Physical address of buffer: 0x1712638000
/users/qianyich/RecoNIC/lib/reconic.c:232:allocate_rdma_buffer(): Info: allocated host buffer vir addr = 0x7f2963438000, physical addr = 1712638000, rn_dev->buffer_offset = 0x12d8000
/users/qianyich/RecoNIC/lib/reconic.c:233:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma host buffer
Info: queue pair setting is done! Configuring RDMA per-queu CSR registers
/users/qianyich/RecoNIC/lib/rdma_api.c:487:allocate_rdma_qp(): DEBUG: rdma_dev->rn_dev->axil_ctl = 0x7f2982252000, rdma_dev->axil_ctl = 0x7f2982252000
/users/qianyich/RecoNIC/lib/rdma_api.c:502:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_IPDESADDR1i=0x60360, qpid=2, value=0xc0643301
/users/qianyich/RecoNIC/lib/rdma_api.c:509:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDLSBi=0x60350, qpid=2, value=0x3502e0fc
/users/qianyich/RecoNIC/lib/rdma_api.c:516:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_MACDESADDMSBi=0x60354, qpid=2, value=0xa
/users/qianyich/RecoNIC/lib/rdma_api.c:521:allocate_rdma_qp(): DEBUG: win_size_high = 0xff, win_size_low = 0xffffffff
/users/qianyich/RecoNIC/lib/rdma_api.c:539:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAi=0x60310, qpid=2, value=0x12622000
/users/qianyich/RecoNIC/lib/rdma_api.c:546:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_SQBAMSBi=0x603c8, qpid=2, value=0x17
/users/qianyich/RecoNIC/lib/rdma_api.c:550:allocate_rdma_qp(): DEBUG: qp->sq->dma_addr = 0x1712622000, sq_addr_msb = 0x17, sq_addr_lsb = 0x12622000
/users/qianyich/RecoNIC/lib/rdma_api.c:568:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAi=0x60318, qpid=2, value=0x12636000
/users/qianyich/RecoNIC/lib/rdma_api.c:575:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQBAMSBi=0x603d0, qpid=2, value=0x17
/users/qianyich/RecoNIC/lib/rdma_api.c:579:allocate_rdma_qp(): DEBUG: qp->cq->dma_addr = 0x1712636000, cq_addr_msb = 0x17, cq_addr_lsb = 0x12636000
/users/qianyich/RecoNIC/lib/rdma_api.c:597:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAi=0x60308, qpid=2, value=0x12638000
/users/qianyich/RecoNIC/lib/rdma_api.c:604:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQBAMSBi=0x603c0, qpid=2, value=0x17
/users/qianyich/RecoNIC/lib/rdma_api.c:608:allocate_rdma_qp(): DEBUG: qp->rq->dma_addr = 0x1712638000, rq_addr_msb = 0x17, rq_addr_lsb = 0x12638000
/users/qianyich/RecoNIC/lib/rdma_api.c:617:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDi=0x60328, qpid=2, value=0x11000000
/users/qianyich/RecoNIC/lib/rdma_api.c:624:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_CQDBADDMSBi=0x6032c, qpid=2, value=0x17
/users/qianyich/RecoNIC/lib/rdma_api.c:625:allocate_rdma_qp(): DEBUG: cq_cidb_addr = 0x1711000000
/users/qianyich/RecoNIC/lib/rdma_api.c:634:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDi=0x60320, qpid=2, value=0x11000020
/users/qianyich/RecoNIC/lib/rdma_api.c:641:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_RQWPTRDBADDMSBi=0x60324, qpid=2, value=0x17
/users/qianyich/RecoNIC/lib/rdma_api.c:642:allocate_rdma_qp(): DEBUG: rq_cidb_addr = 0x1711000020
/users/qianyich/RecoNIC/lib/rdma_api.c:651:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_DESTQPCONFi=0x60348, qpid=2, value=0x2
/users/qianyich/RecoNIC/lib/rdma_api.c:660:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QDEPTHi=0x6033c, qpid=2, value=0x400040
/users/qianyich/RecoNIC/lib/rdma_api.c:707:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPCONFi=0x60300, qpid=2, value=0x200043d
/users/qianyich/RecoNIC/lib/rdma_api.c:721:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_QPADVCONFi=0x60304, qpid=2, value=0x12344000
/users/qianyich/RecoNIC/lib/rdma_api.c:730:allocate_rdma_qp(): [Register] RN_RDMA_QCSR_PDi=0x603b0, qpid=2, value=0x0
Info: allocate_rdma_qp - Successfully allocated a rdma qp
Info: CONFIGURE PSN
[Register] RN_RDMA_QCSR_LSTRQREQi=0x60344, qpid=2, value=0xa000abc
[Register] RN_RDMA_QCSR_SQPSNi=0x60340, qpid=2, value=0xabd
payload_size = 128, payload_size>>2 = 32
Info: Server is listening to a remote peer
Info: Server is connected to a remote peer
/users/qianyich/RecoNIC/lib/reconic.c:256:allocate_rdma_buffer(): Info: allocated device buffer physical addr = a350000000000000, rn_dev->dev_buffer_offset = 0x80
/users/qianyich/RecoNIC/lib/reconic.c:258:allocate_rdma_buffer(): Info: allocate_rdma_buffer - successfully allocated rdma device buffer
Info: rdma_register_memory_region - registering memory region
/users/qianyich/RecoNIC/lib/rdma_api.c:316:rdma_register_memory_region(): [Register] RN_RDMA_PDT_VIRTADDRLSB=0x40004, pd_num=0, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:318:rdma_register_memory_region(): [Register] RN_RDMA_PDT_VIRTADDRMSB=0x40008, pd_num=0, value=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:320:rdma_register_memory_region(): [Register] RN_RDMA_PDT_BUFBASEADDRLSB=0x4000c, pd_num=0, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:322:rdma_register_memory_region(): [Register] RN_RDMA_PDT_BUFBASEADDRMSB=0x40010, pd_num=0, value=0xa3500000
/users/qianyich/RecoNIC/lib/rdma_api.c:324:rdma_register_memory_region(): [Register] RN_RDMA_PDT_BUFRKEY=0x40014, pd_num=0, value=0x8
/users/qianyich/RecoNIC/lib/rdma_api.c:327:rdma_register_memory_region(): [Register] RN_RDMA_PDT_WRRDBUFLEN=0x40018, pd_num=0, value=0x80 B
/users/qianyich/RecoNIC/lib/rdma_api.c:330:rdma_register_memory_region(): [Register] RN_RDMA_PDT_ACCESSDESC=0x4001c, pd_num=0, value=0x2
Info: memory region for the 0-th PD is registered
Info: allocating buffer for payload data
Info: tmp_buffer->buffer = 0xa350000000000000, tmp_buffer->dma_addr = 0xa350000000000000
Info: copy payload data to the device memory
/dev/reconic-mm, W off 0x0, 0x80 failed -1.
write file: Input/output error
Info: copied payload data to the device memory succesfully rc = -5
/users/qianyich/RecoNIC/lib/rdma_api.c:1083:destroy_rdma_qp(): [DEBUG] Destroying dev: 0x7f2982252000, RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:1088:destroy_rdma_qp(): [DEBUG] Destroying dev: 0x7f2982252000, RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0
/users/qianyich/RecoNIC/lib/rdma_api.c:1096:destroy_rdma_qp(): [DEBUG] Destroying dev: 0x7f2982252000, RN_RDMA_QCSR_CQHEADi=0x60330, qpid=2, value=0x0

dmesg on 192.100.51.1: [ 2773.765361] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.

dmesg on 192.100.52.1:

[ 2936.597627] onic:qdma_request_wait_for_cmpl: qdma3b000-MM-64: req 0x000000007cfc492e, W,128,0/128,0x0, done 0, err 0, tm 10000.
[ 2936.609099] onic:qdma_descq_dump: qdma3b000-MM-64: 0x40/0x40, desc sz 1024/1022, pidx 258, cidx 257
[ 2936.609116] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[ 2936.609136] read[3367]: segfault at 29 ip 00007f2982a488f9 sp 00007ffeadff7a40 error 4 in libreconic.so[7f2982a43000+c000]
zhguanw-amd commented 7 months ago

@qianyich I'll check it tomorrow on U250 and see any issues. Porbably it's related to QMDA's queue assignment.

I updated one version related to cmac enable and rmmod onic driver on main branch. You can test it and see whether it looks fine on your switch setup. This patch is not to solve the segfault issue you mentioned above.

qianyich commented 7 months ago

@zhguanw-amd Could you merge "map_large_hmem" into the main branch so that I can see if those changes work fine on my side?

qianyich commented 7 months ago

Nevermind, I think you might think of merging it with other changes later. FYI, the latest commit in the main branch allows me to install the driver and ping each other. I did not test those APIs as they are not fixed yet.

qianyich commented 7 months ago

Also, I am interested in adding QSPI flash into the design so that I can program the board over PCIe. https://github.com/Xilinx/open-nic-shell/blob/add_qspi_and_cms/script/generate_qspi_block_design.tcl might be the starting point.

zhguanw-amd commented 7 months ago

@qianyich please try the new version in the main branch. It should fix your issues in this version.

Also, I am interested in adding QSPI flash into the design so that I can program the board over PCIe. https://github.com/Xilinx/open-nic-shell/blob/add_qspi_and_cms/script/generate_qspi_block_design.tcl might be the starting point.

Yep, it sounds a good idea to have programmability via PCIe. To support it, apart from the hardware part, you also need to implement software to control it. You can start a new thread related to this.

qianyich commented 7 months ago

@zhguanw-amd There is a new problem with the network systolic array example now. I think it is ok for me though as long as the basic RDMA verbs work fine.

What I did: ✅ 1. I ran read, write, and send_recv successfully with no issues. No error message from dmesg. ❌ 2. I failed to run the network systolic array example, I got the following. The failure caused the QDMA to be in a bad state. Read, write, and send_recv did not work afterward.

192.100.52.1 (failed to read Array B):

sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./network_systolic_mm -d /dev/reconic-mm -p /sys/bus/pci/devices/0000\:3b\:00.0
/resource2 -r 192.100.52.1 -i 192.100.51.1 -u 22222 -t 11111 --dst_qp 2 -c 2>&1 | tee client_debug.log
Info: Device - /dev/reconic-mm
Info: PCIe resource file: /sys/bus/pci/devices/0000:3b:00.0/resource2
src_ip_str = 192.100.52.1
dst_ip_str = 192.100.51.1
Info: mac_addr_t = 00:0a:35:24:44:8a
Info: src_ip = 192.100.52.1
Info: Found network interface: enp59s0
Info: mac_addr_t = 00:0a:35:71:f4:5c
Info: Creating rn_dev
create_rn_dev - testing2
Info: pre-allocated hugepage buffer vir addr = 0x7fb423a00000, physical addr = 0x2f19a00000
Info: Configuring 8 windows in QDMA AXI bridge BDF, each has 128GB mapping
Info: CREATE RDMA DEVICE
Info: OPEN RDMA DEVICE
Info: RDMA global control status registers are configured.
Info: rdma_dev opened
Info: ALLOCATE PD
Info: OPEN DEVICE FILE
Info: ALLOCATE RDMA QP
Allocating qp->sq
Allocating qp->cq
Allocating qp->rq
Info: queue pair setting is done! Configuring RDMA per-queu CSR registers
Info: allocate_rdma_qp - Successfully allocated a rdma qp
Info: CONFIGURE PSN
[Register] RN_RDMA_QCSR_LSTRQREQi=0x60344, qpid=2, value=0xa000abc
[Register] RN_RDMA_QCSR_SQPSNi=0x60340, qpid=2, value=0xabd
Info: generating matrix data, matrix_size = 256
Info: Software Matrix Data generated
Info: Client is connecting to a remote server
Info: Client is connected to a remote server
Info: client received remote offset of A = 0xa350000000000000
Info: client received remote offset of B = 0xa350000000000400
Info: creating an RDMA read WQE for getting Array A
Successfully sent an RDMA read operation for Array A!
Info: Dump register values for debug purpose
Info: [RN_RDMA_GCSR_ERRBUFWPTR      = 0x6006c] = 0x0
Info: [RN_RDMA_GCSR_IPKTERRQWPTR    = 0x60094] = 0x0
Info: [RN_RDMA_GCSR_INSRRPKTCNT     = 0x60100] = 0x1
Info: [RN_RDMA_GCSR_INAMPKTCNT      = 0x60104] = 0x1
Info: [RN_RDMA_GCSR_OUTIOPKTCNT     = 0x60108] = 0x10004
Info: [RN_RDMA_GCSR_OUTAMPKTCNT     = 0x6010c] = 0x1
Info: [RN_RDMA_GCSR_LSTINPKT        = 0x60110] = 0xabd0210
Info: [RN_RDMA_GCSR_LSTOUTPKT       = 0x60114] = 0x157a204
Info: [RN_RDMA_GCSR_ININVDUPCNT     = 0x60118] = 0x0
Info: [RN_RDMA_GCSR_INNCKPKTSTS     = 0x6011c] = 0x0
Info: [RN_RDMA_GCSR_OUTRNRPKTSTS    = 0x60120] = 0x0
Info: [RN_RDMA_GCSR_WQEPROCSTS      = 0x60124] = 0xc0c2040
Info: [RN_RDMA_GCSR_QPMSTS          = 0x6012c] = 0x20002
Info: [RN_RDMA_GCSR_INALLDRPPKTCNT  = 0x60130] = 0x40000
Info: [RN_RDMA_GCSR_INNAKPKTCNT     = 0x60134] = 0x0
Info: [RN_RDMA_GCSR_OUTNAKPKTCNT    = 0x60138] = 0x0
Info: [RN_RDMA_GCSR_RESPHNDSTS      = 0x6013c] = 0x10f02
Info: [RN_RDMA_GCSR_RETRYCNTSTS     = 0x60140] = 0x0
Info: [RN_RDMA_GCSR_INCNPPKTCNT     = 0x60174] = 0x0
Info: [RN_RDMA_GCSR_OUTCNPPKTCNT    = 0x60178] = 0x0
Info: [RN_RDMA_GCSR_OUTRDRSPPKTCNT  = 0x6017c] = 0x1
Info: [RN_RDMA_GCSR_INTSTS          = 0x60184] = 0x10
Info: [RN_RDMA_GCSR_RQINTSTS1       = 0x60190] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS2       = 0x60194] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS3       = 0x60198] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS4       = 0x6019c] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS5       = 0x601a0] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS6       = 0x601a4] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS7       = 0x601a8] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS8       = 0x601ac] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS1       = 0x601b0] = 0x4
Info: [RN_RDMA_GCSR_CQINTSTS2       = 0x601b4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS3       = 0x601b8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS4       = 0x601bc] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS5       = 0x601c0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS6       = 0x601c4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS7       = 0x601c8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS8       = 0x601cc] = 0x0
Info: [RN_RDMA_QCSR_CQHEADi         = 0x60330] = 0x1
Info: [RN_RDMA_QCSR_STATSSNi        = 0x60380] = 0x2
Info: [RN_RDMA_QCSR_STATMSNi        = 0x60384] = 0x0
Info: [RN_RDMA_QCSR_STATQPi         = 0x60388] = 0x1f0600
Info: [RN_RDMA_QCSR_STATCURSQPTRi   = 0x6038c] = 0x1
Info: [RN_RDMA_QCSR_STATRESPSNi     = 0x60390] = 0xabd
Info: [RN_RDMA_QCSR_STATRQBUFCAi    = 0x60394] = 0x1a82b000
Info: [RN_RDMA_QCSR_STATWQEi        = 0x60398] = 0x0
Info: [RN_RDMA_QCSR_STATRQPIDBi     = 0x6039c] = 0x0
Info: [RN_RDMA_QCSR_STATRQBUFCAMSBi = 0x603d8] = 0x2f
Info: [RN_RDMA_QCSR_SQPIi           = 0x60338] = 0x1

ERROR: poll_cq_cidb timeout! sq_cidb = 1; Polling CQ CIDB = 1
Info: Dump register values for debug purpose
Info: [RN_RDMA_GCSR_ERRBUFWPTR      = 0x6006c] = 0x0
Info: [RN_RDMA_GCSR_IPKTERRQWPTR    = 0x60094] = 0x0
Info: [RN_RDMA_GCSR_INSRRPKTCNT     = 0x60100] = 0x1
Info: [RN_RDMA_GCSR_INAMPKTCNT      = 0x60104] = 0x1
Info: [RN_RDMA_GCSR_OUTIOPKTCNT     = 0x60108] = 0x20004
Info: [RN_RDMA_GCSR_OUTAMPKTCNT     = 0x6010c] = 0x1
Info: [RN_RDMA_GCSR_LSTINPKT        = 0x60110] = 0xabe0210
Info: [RN_RDMA_GCSR_LSTOUTPKT       = 0x60114] = 0x157c204
Info: [RN_RDMA_GCSR_ININVDUPCNT     = 0x60118] = 0x1
Info: [RN_RDMA_GCSR_INNCKPKTSTS     = 0x6011c] = 0x0
Info: [RN_RDMA_GCSR_OUTRNRPKTSTS    = 0x60120] = 0x0
Info: [RN_RDMA_GCSR_WQEPROCSTS      = 0x60124] = 0xd0d2040
Info: [RN_RDMA_GCSR_QPMSTS          = 0x6012c] = 0x30002
Info: [RN_RDMA_GCSR_INALLDRPPKTCNT  = 0x60130] = 0x50000
Info: [RN_RDMA_GCSR_INNAKPKTCNT     = 0x60134] = 0x0
Info: [RN_RDMA_GCSR_OUTNAKPKTCNT    = 0x60138] = 0x0
Info: [RN_RDMA_GCSR_RESPHNDSTS      = 0x6013c] = 0x30f02
Info: [RN_RDMA_GCSR_RETRYCNTSTS     = 0x60140] = 0x0
Info: [RN_RDMA_GCSR_INCNPPKTCNT     = 0x60174] = 0x0
Info: [RN_RDMA_GCSR_OUTCNPPKTCNT    = 0x60178] = 0x0
Info: [RN_RDMA_GCSR_OUTRDRSPPKTCNT  = 0x6017c] = 0x1
Info: [RN_RDMA_GCSR_INTSTS          = 0x60184] = 0x11
Info: [RN_RDMA_GCSR_RQINTSTS1       = 0x60190] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS2       = 0x60194] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS3       = 0x60198] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS4       = 0x6019c] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS5       = 0x601a0] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS6       = 0x601a4] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS7       = 0x601a8] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS8       = 0x601ac] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS1       = 0x601b0] = 0x4
Info: [RN_RDMA_GCSR_CQINTSTS2       = 0x601b4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS3       = 0x601b8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS4       = 0x601bc] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS5       = 0x601c0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS6       = 0x601c4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS7       = 0x601c8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS8       = 0x601cc] = 0x0
Info: [RN_RDMA_QCSR_CQHEADi         = 0x60330] = 0x1
Info: [RN_RDMA_QCSR_STATSSNi        = 0x60380] = 0x3
Info: [RN_RDMA_QCSR_STATMSNi        = 0x60384] = 0x0
Info: [RN_RDMA_QCSR_STATQPi         = 0x60388] = 0x1f0200
Info: [RN_RDMA_QCSR_STATCURSQPTRi   = 0x6038c] = 0x2
Info: [RN_RDMA_QCSR_STATRESPSNi     = 0x60390] = 0xabe
Info: [RN_RDMA_QCSR_STATRQBUFCAi    = 0x60394] = 0x1a82b000
Info: [RN_RDMA_QCSR_STATWQEi        = 0x60398] = 0x0
Info: [RN_RDMA_QCSR_STATRQPIDBi     = 0x6039c] = 0x0
Info: [RN_RDMA_QCSR_STATRQBUFCAMSBi = 0x603d8] = 0x2f
Info: [RN_RDMA_QCSR_SQPIi           = 0x60338] = 0x2

Failed to send an RDMA read operation for Array B!
Info: Dump register values for debug purpose
Info: [RN_RDMA_GCSR_ERRBUFWPTR      = 0x6006c] = 0x0
Info: [RN_RDMA_GCSR_IPKTERRQWPTR    = 0x60094] = 0x0
Info: [RN_RDMA_GCSR_INSRRPKTCNT     = 0x60100] = 0x1
Info: [RN_RDMA_GCSR_INAMPKTCNT      = 0x60104] = 0x1
Info: [RN_RDMA_GCSR_OUTIOPKTCNT     = 0x60108] = 0x20004
Info: [RN_RDMA_GCSR_OUTAMPKTCNT     = 0x6010c] = 0x1
Info: [RN_RDMA_GCSR_LSTINPKT        = 0x60110] = 0xabe0210
Info: [RN_RDMA_GCSR_LSTOUTPKT       = 0x60114] = 0x157c204
Info: [RN_RDMA_GCSR_ININVDUPCNT     = 0x60118] = 0x1
Info: [RN_RDMA_GCSR_INNCKPKTSTS     = 0x6011c] = 0x0
Info: [RN_RDMA_GCSR_OUTRNRPKTSTS    = 0x60120] = 0x0
Info: [RN_RDMA_GCSR_WQEPROCSTS      = 0x60124] = 0xd0d2040
Info: [RN_RDMA_GCSR_QPMSTS          = 0x6012c] = 0x30002
Info: [RN_RDMA_GCSR_INALLDRPPKTCNT  = 0x60130] = 0x50000
Info: [RN_RDMA_GCSR_INNAKPKTCNT     = 0x60134] = 0x0
Info: [RN_RDMA_GCSR_OUTNAKPKTCNT    = 0x60138] = 0x0
Info: [RN_RDMA_GCSR_RESPHNDSTS      = 0x6013c] = 0x30f02
Info: [RN_RDMA_GCSR_RETRYCNTSTS     = 0x60140] = 0x0
Info: [RN_RDMA_GCSR_INCNPPKTCNT     = 0x60174] = 0x0
Info: [RN_RDMA_GCSR_OUTCNPPKTCNT    = 0x60178] = 0x0
Info: [RN_RDMA_GCSR_OUTRDRSPPKTCNT  = 0x6017c] = 0x1
Info: [RN_RDMA_GCSR_INTSTS          = 0x60184] = 0x11
Info: [RN_RDMA_GCSR_RQINTSTS1       = 0x60190] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS2       = 0x60194] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS3       = 0x60198] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS4       = 0x6019c] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS5       = 0x601a0] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS6       = 0x601a4] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS7       = 0x601a8] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS8       = 0x601ac] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS1       = 0x601b0] = 0x4
Info: [RN_RDMA_GCSR_CQINTSTS2       = 0x601b4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS3       = 0x601b8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS4       = 0x601bc] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS5       = 0x601c0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS6       = 0x601c4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS7       = 0x601c8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS8       = 0x601cc] = 0x0
Info: [RN_RDMA_QCSR_CQHEADi         = 0x60330] = 0x1
Info: [RN_RDMA_QCSR_STATSSNi        = 0x60380] = 0x3
Info: [RN_RDMA_QCSR_STATMSNi        = 0x60384] = 0x0
Info: [RN_RDMA_QCSR_STATQPi         = 0x60388] = 0x1f0200
Info: [RN_RDMA_QCSR_STATCURSQPTRi   = 0x6038c] = 0x2
Info: [RN_RDMA_QCSR_STATRESPSNi     = 0x60390] = 0xabe
Info: [RN_RDMA_QCSR_STATRQBUFCAi    = 0x60394] = 0x1a82b000
Info: [RN_RDMA_QCSR_STATWQEi        = 0x60398] = 0x0
Info: [RN_RDMA_QCSR_STATRQPIDBi     = 0x6039c] = 0x0
Info: [RN_RDMA_QCSR_STATRQBUFCAMSBi = 0x603d8] = 0x2f
Info: [RN_RDMA_QCSR_SQPIi           = 0x60338] = 0x2

Info: Is Computation finished, compute_done = 1
Info: The value of rc is 1024
** Avg time device /dev/reconic-mm, total time 0.001005 sec, size = 16
hw_work_id = 0xdd
Error: Result mismatch
i = 0,  CPU result = 250
Hardware result = 0
Test failed!

192.100.51.1:

 sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./network_systolic_mm -d /dev/reconic-mm -p /sys/bus/pci/devices/0000\:3b\:00.0
/resource2 -r 192.100.51.1 -i 192.100.52.1 -u 22222 -t 11111 --dst_qp 2 -s 2>&1 | tee server_debug.log
Info: Device - /dev/reconic-mm
Info: PCIe resource file: /sys/bus/pci/devices/0000:3b:00.0/resource2
src_ip_str = 192.100.51.1
dst_ip_str = 192.100.52.1
Info: mac_addr_t = 00:0a:35:71:f4:5c
Info: src_ip = 192.100.51.1
Info: Found network interface: enp59s0
Info: mac_addr_t = 00:0a:35:24:44:8a
Info: Creating rn_dev
create_rn_dev - testing2
Info: pre-allocated hugepage buffer vir addr = 0x7fecec000000, physical addr = 0x172a400000
Info: Configuring 8 windows in QDMA AXI bridge BDF, each has 128GB mapping
Info: CREATE RDMA DEVICE
Info: OPEN RDMA DEVICE
Info: RDMA global control status registers are configured.
Info: rdma_dev opened
Info: ALLOCATE PD
Info: OPEN DEVICE FILE
Info: ALLOCATE RDMA QP
Allocating qp->sq
Allocating qp->cq
Allocating qp->rq
Info: queue pair setting is done! Configuring RDMA per-queu CSR registers
Info: allocate_rdma_qp - Successfully allocated a rdma qp
Info: CONFIGURE PSN
[Register] RN_RDMA_QCSR_LSTRQREQi=0x60344, qpid=2, value=0xa000abc
[Register] RN_RDMA_QCSR_SQPSNi=0x60340, qpid=2, value=0xabd
Info: generating matrix data, matrix_size = 256
Info: Software Matrix Data generated
Info: Server is listening to a remote peer
Info: Server is connected to a remote peer
Info: rdma_register_memory_region - registering memory region
Info: memory region for the 0-th PD is registered
Info: allocating buffer for array A
Info: mr_bufferA->buffer = 0xa350000000000000, mr_bufferA->dma_addr = 0xa350000000000000
Info: allocating buffer for array B
Info: mr_bufferB->buffer = 0xa350000000000400, mr_bufferB->dma_addr = 0xa350000000000400
Info: copy matrix data to the device memory
Info: Host buffer vir address used for RDMA read operation is mr_bufferA = 0xa350000000000000, mr_bufferB = 0xa350000000000400
Sending read_offsetA (a350000000000000) to the remote client
Sending read_offsetB (a350000000000400) to the remote client
Does the client finish its RDMA read operation? If yes, please press any key

Info: Dump register values for debug purpose
Info: [RN_RDMA_GCSR_ERRBUFWPTR      = 0x6006c] = 0x0
Info: [RN_RDMA_GCSR_IPKTERRQWPTR    = 0x60094] = 0x0
Info: [RN_RDMA_GCSR_INSRRPKTCNT     = 0x60100] = 0x10004
Info: [RN_RDMA_GCSR_INAMPKTCNT      = 0x60104] = 0x1
Info: [RN_RDMA_GCSR_OUTIOPKTCNT     = 0x60108] = 0x20000
Info: [RN_RDMA_GCSR_OUTAMPKTCNT     = 0x6010c] = 0x1
Info: [RN_RDMA_GCSR_LSTINPKT        = 0x60110] = 0xabe020c
Info: [RN_RDMA_GCSR_LSTOUTPKT       = 0x60114] = 0x157a200
Info: [RN_RDMA_GCSR_ININVDUPCNT     = 0x60118] = 0x0
Info: [RN_RDMA_GCSR_INNCKPKTSTS     = 0x6011c] = 0x0
Info: [RN_RDMA_GCSR_OUTRNRPKTSTS    = 0x60120] = 0x0
Info: [RN_RDMA_GCSR_WQEPROCSTS      = 0x60124] = 0x8082000
Info: [RN_RDMA_GCSR_QPMSTS          = 0x6012c] = 0x20002
Info: [RN_RDMA_GCSR_INALLDRPPKTCNT  = 0x60130] = 0x80000
Info: [RN_RDMA_GCSR_INNAKPKTCNT     = 0x60134] = 0x0
Info: [RN_RDMA_GCSR_OUTNAKPKTCNT    = 0x60138] = 0x0
Info: [RN_RDMA_GCSR_RESPHNDSTS      = 0x6013c] = 0x10002
Info: [RN_RDMA_GCSR_RETRYCNTSTS     = 0x60140] = 0x0
Info: [RN_RDMA_GCSR_INCNPPKTCNT     = 0x60174] = 0x0
Info: [RN_RDMA_GCSR_OUTCNPPKTCNT    = 0x60178] = 0x0
Info: [RN_RDMA_GCSR_OUTRDRSPPKTCNT  = 0x6017c] = 0x2
Info: [RN_RDMA_GCSR_INTSTS          = 0x60184] = 0x10
Info: [RN_RDMA_GCSR_RQINTSTS1       = 0x60190] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS2       = 0x60194] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS3       = 0x60198] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS4       = 0x6019c] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS5       = 0x601a0] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS6       = 0x601a4] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS7       = 0x601a8] = 0x0
Info: [RN_RDMA_GCSR_RQINTSTS8       = 0x601ac] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS1       = 0x601b0] = 0x4
Info: [RN_RDMA_GCSR_CQINTSTS2       = 0x601b4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS3       = 0x601b8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS4       = 0x601bc] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS5       = 0x601c0] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS6       = 0x601c4] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS7       = 0x601c8] = 0x0
Info: [RN_RDMA_GCSR_CQINTSTS8       = 0x601cc] = 0x0
Info: [RN_RDMA_QCSR_CQHEADi         = 0x60330] = 0x0
Info: [RN_RDMA_QCSR_STATSSNi        = 0x60380] = 0x2
Info: [RN_RDMA_QCSR_STATMSNi        = 0x60384] = 0x2
Info: [RN_RDMA_QCSR_STATQPi         = 0x60388] = 0x1f0600
Info: [RN_RDMA_QCSR_STATCURSQPTRi   = 0x6038c] = 0x0
Info: [RN_RDMA_QCSR_STATRESPSNi     = 0x60390] = 0xabc
Info: [RN_RDMA_QCSR_STATRQBUFCAi    = 0x60394] = 0x2ba2b000
Info: [RN_RDMA_QCSR_STATWQEi        = 0x60398] = 0x0
Info: [RN_RDMA_QCSR_STATRQPIDBi     = 0x6039c] = 0x0
Info: [RN_RDMA_QCSR_STATRQBUFCAMSBi = 0x603d8] = 0x17

no error message from dmesg on 192.100.51.1.

error message from dmesg on 192.100.52.1:

[  700.598882] onic:error_intr_handler: Error IRQ fired on Funtion#0: index=7, vector=272
[  700.598889] eqdma_hw_error_process: Global Err Reg(0x248) = 0x4
[  700.598903] addr = 0x00000254 val = 0x00100000
[  700.598908] GLBL_DSC_ERR_STS                         0x254     0x100000   1048576
[  700.598909] GLBL_DSC_ERR_STS_RSVD_1                  [31,26]   0x0
[  700.598909] GLBL_DSC_ERR_STS_PORT_ID                 [   25]   0x0
[  700.598910] GLBL_DSC_ERR_STS_SBE                     [   24]   0x0
[  700.598911] GLBL_DSC_ERR_STS_DBE                     [   23]   0x0
[  700.598911] GLBL_DSC_ERR_STS_RQ_CANCEL               [   22]   0x0
[  700.598912] GLBL_DSC_ERR_STS_DSC                     [   21]   0x0
[  700.598912] GLBL_DSC_ERR_STS_DMA                     [   20]   0x1
[  700.598913] GLBL_DSC_ERR_STS_FLR_CANCEL              [   19]   0x0
[  700.598914] GLBL_DSC_ERR_STS_RSVD_2                  [18,17]   0x0
[  700.598914] GLBL_DSC_ERR_STS_DAT_POISON              [   16]   0x0
[  700.598915] GLBL_DSC_ERR_STS_TIMEOUT                 [    9]   0x0
[  700.598915] GLBL_DSC_ERR_STS_FLR                     [    8]   0x0
[  700.598916] GLBL_DSC_ERR_STS_TAG                     [    6]   0x0
[  700.598916] GLBL_DSC_ERR_STS_ADDR                    [    5]   0x0
[  700.598916] GLBL_DSC_ERR_STS_PARAM                   [    4]   0x0
[  700.598917] GLBL_DSC_ERR_STS_BCNT                    [    3]   0x0
[  700.598917] GLBL_DSC_ERR_STS_UR_CA                   [    2]   0x0
[  700.598918] GLBL_DSC_ERR_STS_POISON                  [    1]   0x0
[  700.598922] GLBL_DSC_ERR_LOG0                        0x25c     0xc0000041 -1073741759
[  700.598922] GLBL_DSC_ERR_LOG0_VALID                  [   31]   0x1
[  700.598923] GLBL_DSC_ERR_LOG0_SEL                    [   30]   0x1
[  700.598923] GLBL_DSC_ERR_LOG0_RSVD_1                 [29,13]   0x0
[  700.598924] GLBL_DSC_ERR_LOG0_QID                    [12, 0]   0x41
[  700.598928] GLBL_DSC_ERR_LOG1                        0x260     0x1014     4116
[  700.598928] GLBL_DSC_ERR_LOG1_RSVD_1                 [31,28]   0x0
[  700.598929] GLBL_DSC_ERR_LOG1_CIDX                   [27,12]   0x1
[  700.598929] GLBL_DSC_ERR_LOG1_RSVD_2                 [11, 9]   0x0
[  700.598930] GLBL_DSC_ERR_LOG1_SUB_TYPE               [ 8, 5]   0x0
[  700.598930] GLBL_DSC_ERR_LOG1_ERR_TYPE               [ 4, 0]   0x14
[  700.598934] GLBL_DSC_DBG_DAT0                        0x270     0x0        0
[  700.598934] GLBL_DSC_DAT0_RSVD_1                     [31,30]   0x0
[  700.598935] GLBL_DSC_DAT0_CTXT_ARB_DIR               [   29]   0x0
[  700.598935] GLBL_DSC_DAT0_CTXT_ARB_QID               [28,17]   0x0
[  700.598936] GLBL_DSC_DAT0_CTXT_ARB_REQ               [16,12]   0x0
[  700.598936] GLBL_DSC_DAT0_IRQ_FIFO_FL                [   11]   0x0
[  700.598937] GLBL_DSC_DAT0_TMSTALL                    [   10]   0x0
[  700.598937] GLBL_DSC_DAT0_RRQ_STALL                  [ 9, 8]   0x0
[  700.598938] GLBL_DSC_DAT0_RCP_FIFO_SPC_STALL         [ 7, 6]   0x0
[  700.598938] GLBL_DSC_DAT0_RRQ_FIFO_SPC_STALL         [ 5, 4]   0x0
[  700.598939] GLBL_DSC_DAT0_FAB_MRKR_RSP_STALL         [ 3, 2]   0x0
[  700.598939] GLBL_DSC_DAT0_DSC_OUT_STALL              [ 1, 0]   0x0
[  700.598943] GLBL_DSC_DBG_DAT1                        0x274     0x0        0
[  700.598944] GLBL_DSC_DAT1_RSVD_1                     [31,28]   0x0
[  700.598944] GLBL_DSC_DAT1_EVT_SPC_C2H                [27,22]   0x0
[  700.598945] GLBL_DSC_DAT1_EVT_SP_H2C                 [21,16]   0x0
[  700.598945] GLBL_DSC_DAT1_DSC_SPC_C2H                [15, 8]   0x0
[  700.598945] GLBL_DSC_DAT1_DSC_SPC_H2C                [ 7, 0]   0x0
[  700.598949] GLBL_DSC_ERR_LOG2                        0x27c     0x10001    65537
[  700.598950] GLBL_DSC_ERR_LOG2_OLD_PIDX               [31,16]   0x1
[  700.598950] GLBL_DSC_ERR_LOG2_NEW_PIDX               [15, 0]   0x1
[  700.598951] eqdma_hw_error_process detected DMA engine error

Then I run the dma test on 192.100.52.1:

qianyich@pc167:~/RecoNIC/examples/dma_test$ sudo ./dma_test -d /dev/reconic-mm -s 65536000 -c 200
Write scenario
/dev/reconic-mm, W off 0x0, 0x3e80000 failed -1.
write file: Input/output error
qianyich@pc167:~/RecoNIC/examples/dma_test$ sudo ./dma_test -d /dev/reconic-mm -s 65536000 -c 200 -r
Read scenario
/dev/reconic-mm, read off 0x0 + 0x3e80000 failed -1.
read file: Input/output error

dmesg:

[  795.624964] onic:qdma_request_wait_for_cmpl: qdma3b000-MM-67: req 0x000000002c539dbf, W,128,0/128,0x0, done 0, err 0, tm 10000.
[  795.636427] onic:qdma_descq_dump: qdma3b000-MM-67: 0x43/0x43, desc sz 1024/1022, pidx 1, cidx 0
[  795.636443] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.

Note that the dma is still working on 192.100.51.1.

qianyich commented 7 months ago

@zhguanw-amd Another problem is after I ran read, write, and send_recv test successfully, I went ahead to test dma on both servers. DMA failed to work on 192.100.51.1 but worked on 192.100.52.1. This seems to relate to the problem of network systolic array as I saw a similar error again from dmesg. I think it is the QDMA part is still a bit problematic.

qianyich@pc166:~/RecoNIC/examples/dma_test$ sudo ./dma_test -d /dev/reconic-mm -s 65536000 -c 200
Write scenario
size=65536000 Average BW = 4.884794 GB/sec, average latency = 13416.328467 us
qianyich@pc166:~/RecoNIC/examples/dma_test$ sudo ./dma_test -d /dev/reconic-mm -s 65536000 -c 200 -r
Read scenario
/dev/reconic-mm, read off 0x0 + 0x3e80000 failed -1.
read file: Input/output error

dmesg:

[  492.857842] onic:error_intr_handler: Error IRQ fired on Funtion#0: index=7, vector=272
[  492.857849] eqdma_hw_error_process: Global Err Reg(0x248) = 0x4
[  492.857852] addr = 0x00000254 val = 0x00100000
[  492.857858] GLBL_DSC_ERR_STS                         0x254     0x100000   1048576
[  492.857861] GLBL_DSC_ERR_STS_RSVD_1                  [31,26]   0x0
[  492.857862] GLBL_DSC_ERR_STS_PORT_ID                 [   25]   0x0
[  492.857864] GLBL_DSC_ERR_STS_SBE                     [   24]   0x0
[  492.857865] GLBL_DSC_ERR_STS_DBE                     [   23]   0x0
[  492.857867] GLBL_DSC_ERR_STS_RQ_CANCEL               [   22]   0x0
[  492.857868] GLBL_DSC_ERR_STS_DSC                     [   21]   0x0
[  492.857869] GLBL_DSC_ERR_STS_DMA                     [   20]   0x1
[  492.857870] GLBL_DSC_ERR_STS_FLR_CANCEL              [   19]   0x0
[  492.857872] GLBL_DSC_ERR_STS_RSVD_2                  [18,17]   0x0
[  492.857873] GLBL_DSC_ERR_STS_DAT_POISON              [   16]   0x0
[  492.857875] GLBL_DSC_ERR_STS_TIMEOUT                 [    9]   0x0
[  492.857876] GLBL_DSC_ERR_STS_FLR                     [    8]   0x0
[  492.857877] GLBL_DSC_ERR_STS_TAG                     [    6]   0x0
[  492.857879] GLBL_DSC_ERR_STS_ADDR                    [    5]   0x0
[  492.857880] GLBL_DSC_ERR_STS_PARAM                   [    4]   0x0
[  492.857881] GLBL_DSC_ERR_STS_BCNT                    [    3]   0x0
[  492.857882] GLBL_DSC_ERR_STS_UR_CA                   [    2]   0x0
[  492.857884] GLBL_DSC_ERR_STS_POISON                  [    1]   0x0
[  492.857889] GLBL_DSC_ERR_LOG0                        0x25c     0xc0000041 -1073741759
[  492.857890] GLBL_DSC_ERR_LOG0_VALID                  [   31]   0x1
[  492.857892] GLBL_DSC_ERR_LOG0_SEL                    [   30]   0x1
[  492.857893] GLBL_DSC_ERR_LOG0_RSVD_1                 [29,13]   0x0
[  492.857895] GLBL_DSC_ERR_LOG0_QID                    [12, 0]   0x41
[  492.857900] GLBL_DSC_ERR_LOG1                        0x260     0x1014     4116
[  492.857902] GLBL_DSC_ERR_LOG1_RSVD_1                 [31,28]   0x0
[  492.857903] GLBL_DSC_ERR_LOG1_CIDX                   [27,12]   0x1
[  492.857905] GLBL_DSC_ERR_LOG1_RSVD_2                 [11, 9]   0x0
[  492.857906] GLBL_DSC_ERR_LOG1_SUB_TYPE               [ 8, 5]   0x0
[  492.857908] GLBL_DSC_ERR_LOG1_ERR_TYPE               [ 4, 0]   0x14
[  492.857912] GLBL_DSC_DBG_DAT0                        0x270     0x0        0
[  492.857914] GLBL_DSC_DAT0_RSVD_1                     [31,30]   0x0
[  492.857915] GLBL_DSC_DAT0_CTXT_ARB_DIR               [   29]   0x0
[  492.857917] GLBL_DSC_DAT0_CTXT_ARB_QID               [28,17]   0x0
[  492.857918] GLBL_DSC_DAT0_CTXT_ARB_REQ               [16,12]   0x0
[  492.857920] GLBL_DSC_DAT0_IRQ_FIFO_FL                [   11]   0x0
[  492.857921] GLBL_DSC_DAT0_TMSTALL                    [   10]   0x0
[  492.857922] GLBL_DSC_DAT0_RRQ_STALL                  [ 9, 8]   0x0
[  492.857924] GLBL_DSC_DAT0_RCP_FIFO_SPC_STALL         [ 7, 6]   0x0
[  492.857926] GLBL_DSC_DAT0_RRQ_FIFO_SPC_STALL         [ 5, 4]   0x0
[  492.857927] GLBL_DSC_DAT0_FAB_MRKR_RSP_STALL         [ 3, 2]   0x0
[  492.857928] GLBL_DSC_DAT0_DSC_OUT_STALL              [ 1, 0]   0x0
[  492.857933] GLBL_DSC_DBG_DAT1                        0x274     0x0        0
[  492.857934] GLBL_DSC_DAT1_RSVD_1                     [31,28]   0x0
[  492.857936] GLBL_DSC_DAT1_EVT_SPC_C2H                [27,22]   0x0
[  492.857937] GLBL_DSC_DAT1_EVT_SP_H2C                 [21,16]   0x0
[  492.857939] GLBL_DSC_DAT1_DSC_SPC_C2H                [15, 8]   0x0
[  492.857940] GLBL_DSC_DAT1_DSC_SPC_H2C                [ 7, 0]   0x0
[  492.857945] GLBL_DSC_ERR_LOG2                        0x27c     0x3ff03ff  67044351
[  492.857946] GLBL_DSC_ERR_LOG2_OLD_PIDX               [31,16]   0x3ff
[  492.857948] GLBL_DSC_ERR_LOG2_NEW_PIDX               [15, 0]   0x3ff
[  492.857949] eqdma_hw_error_process detected DMA engine error
[  503.005208] onic:qdma_request_wait_for_cmpl: qdma3b000-MM-65: req 0x00000000bf355690, R,4194304,0/65536000,0x0, done 0, err 0, tm 10000.
[  503.017464] onic:qdma_descq_dump: qdma3b000-MM-65: 0x41/0x41, desc sz 1024/0, pidx 0, cidx 1
[  503.018028] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[  512.944103] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
qianyich commented 7 months ago

@zhguanw-amd Also sometimes when I insert the onic.ko module, I have the following errors from dmesg. This happens sometimes and a reboot helps resolve this issue, but not sure why this happens randomly. I just report this, and it could be related to the aforementioned QDMA error.

[  905.919063] qdma_is_config_bar: Invalid config bar, err:-4
[  905.924560] qdma_hw_access_init: config bar passed is INVALID, err:-1
[  905.931044] onic 0000:3b:00.0: onic_qdma_setup: qdma_device_open() failed: Error Code: -22
[  905.939304] onic 0000:3b:00.0: onic_pci_probe: onic_qdma_setup() failed with status -22
[  905.947330] onic: probe of 0000:3b:00.0 failed with error -22
zhguanw-amd commented 7 months ago

@qianyich did you apply the new onic.patch to RecoNIC/drivers/onic-driver? It seems you're probably working on an old driver. I can run network_systolic_mm without issue. But there is seg fault issue when freeing some buffers and it has nothing to do with qdma and rdma. I will push a fix for this seg fault.

Can you take a screenshot of Line 439 - 453 in onic_main.c?

qianyich commented 7 months ago

@zhguanw-amd The new onic.patch has been applied.

image

The problem is not directly related to network_systolic_mm. After I ran the read, write, and send_recv tests with no issues, the DMA test failed on 192.100.51.1 (this is what I had after I rebooted both machines and reinserted the kernel module, I did not run the network systolic array example that time). I will run those tests again tomorrow.

qianyich commented 7 months ago

I will reprogram the board and run network_systolic_mm first tomorrow to see if that works, and then run those verb tests, and finally the dma test.

zhguanw-amd commented 7 months ago

Yes, please reprogram the board and make sure it has the fresh environment.

zhguanw-amd commented 7 months ago

@zhguanw-amd The new onic.patch has been applied.

image

The problem is not directly related to network_systolic_mm. After I ran the read, write, and send_recv tests with no issues, the DMA test failed on 192.100.51.1 (this is what I had after I rebooted both machines and reinserted the kernel module, I did not run the network systolic array example that time). I will run those tests again tomorrow.

Could you comment out Line 12 and 14 in your screenshot above and only leave Line 13, and have a try?

qianyich commented 7 months ago

@zhguanw-amd I commented out the lines you mentioned, and I ran network_systolic_mm first, it showed Test Passed. And then I ran read, write, and send_recv tests, and they all worked with no issues. After these tests, I tried to run network_systolic_mm example again, and it failed to send the RDMA read for Array B.

dmesg on 192.100.52.1, maybe this is the seg fault you mentioned before?:

[  952.640278] network_systoli[15528]: segfault at 122000000de ip 00007f864e53d94d sp 00007ffffa01f3a0 error 4 in libc-2.27.so[7f864e4a6000+1e7000]
[ 1054.218446] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[ 1140.048294] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[ 1268.669567] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[ 1306.199807] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[ 1306.199813] network_systoli[15626]: segfault at 122000000de ip 00007fbda6a2b94d sp 00007ffe9986c260 error 4 in libc-2.27.so[7fbda6994000+1e7000]

But this somehow has affected QDMA. The DMA tests failed as well. It is an error that is similar to the one I had yesterday.

[ 1355.762811] onic:qdma_request_wait_for_cmpl: qdma3b000-MM-67: req 0x00000000a25c3472, W,65536000,0/65536000,0x0, done 0, err 0, tm 10000.
[ 1355.775149] onic:qdma_descq_dump: qdma3b000-MM-67: 0x43/0x43, desc sz 1024/895, pidx 640, cidx 512
[ 1355.775478] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
[ 1358.805712] onic:error_intr_handler: Error IRQ fired on Funtion#0: index=7, vector=272
[ 1358.805717] eqdma_hw_error_process: Global Err Reg(0x248) = 0x4
[ 1358.805719] addr = 0x00000254 val = 0x00100000
[ 1358.805723] GLBL_DSC_ERR_STS                         0x254     0x100000   1048576
[ 1358.805724] GLBL_DSC_ERR_STS_RSVD_1                  [31,26]   0x0
[ 1358.805725] GLBL_DSC_ERR_STS_PORT_ID                 [   25]   0x0
[ 1358.805726] GLBL_DSC_ERR_STS_SBE                     [   24]   0x0
[ 1358.805726] GLBL_DSC_ERR_STS_DBE                     [   23]   0x0
[ 1358.805727] GLBL_DSC_ERR_STS_RQ_CANCEL               [   22]   0x0
[ 1358.805727] GLBL_DSC_ERR_STS_DSC                     [   21]   0x0
[ 1358.805727] GLBL_DSC_ERR_STS_DMA                     [   20]   0x1
[ 1358.805728] GLBL_DSC_ERR_STS_FLR_CANCEL              [   19]   0x0
[ 1358.805729] GLBL_DSC_ERR_STS_RSVD_2                  [18,17]   0x0
[ 1358.805729] GLBL_DSC_ERR_STS_DAT_POISON              [   16]   0x0
[ 1358.805729] GLBL_DSC_ERR_STS_TIMEOUT                 [    9]   0x0
[ 1358.805730] GLBL_DSC_ERR_STS_FLR                     [    8]   0x0
[ 1358.805730] GLBL_DSC_ERR_STS_TAG                     [    6]   0x0
[ 1358.805731] GLBL_DSC_ERR_STS_ADDR                    [    5]   0x0
[ 1358.805731] GLBL_DSC_ERR_STS_PARAM                   [    4]   0x0
[ 1358.805732] GLBL_DSC_ERR_STS_BCNT                    [    3]   0x0
[ 1358.805732] GLBL_DSC_ERR_STS_UR_CA                   [    2]   0x0
[ 1358.805732] GLBL_DSC_ERR_STS_POISON                  [    1]   0x0
[ 1358.805736] GLBL_DSC_ERR_LOG0                        0x25c     0xc0000043 -1073741757
[ 1358.805737] GLBL_DSC_ERR_LOG0_VALID                  [   31]   0x1
[ 1358.805737] GLBL_DSC_ERR_LOG0_SEL                    [   30]   0x1
[ 1358.805738] GLBL_DSC_ERR_LOG0_RSVD_1                 [29,13]   0x0
[ 1358.805738] GLBL_DSC_ERR_LOG0_QID                    [12, 0]   0x43
[ 1358.805742] GLBL_DSC_ERR_LOG1                        0x260     0x280014   2621460
[ 1358.805742] GLBL_DSC_ERR_LOG1_RSVD_1                 [31,28]   0x0
[ 1358.805743] GLBL_DSC_ERR_LOG1_CIDX                   [27,12]   0x280
[ 1358.805743] GLBL_DSC_ERR_LOG1_RSVD_2                 [11, 9]   0x0
[ 1358.805744] GLBL_DSC_ERR_LOG1_SUB_TYPE               [ 8, 5]   0x0
[ 1358.805744] GLBL_DSC_ERR_LOG1_ERR_TYPE               [ 4, 0]   0x14
[ 1358.805748] GLBL_DSC_DBG_DAT0                        0x270     0x0        0
[ 1358.805748] GLBL_DSC_DAT0_RSVD_1                     [31,30]   0x0
[ 1358.805749] GLBL_DSC_DAT0_CTXT_ARB_DIR               [   29]   0x0
[ 1358.805749] GLBL_DSC_DAT0_CTXT_ARB_QID               [28,17]   0x0
[ 1358.805749] GLBL_DSC_DAT0_CTXT_ARB_REQ               [16,12]   0x0
[ 1358.805750] GLBL_DSC_DAT0_IRQ_FIFO_FL                [   11]   0x0
[ 1358.805750] GLBL_DSC_DAT0_TMSTALL                    [   10]   0x0
[ 1358.805751] GLBL_DSC_DAT0_RRQ_STALL                  [ 9, 8]   0x0
[ 1358.805752] GLBL_DSC_DAT0_RCP_FIFO_SPC_STALL         [ 7, 6]   0x0
[ 1358.805752] GLBL_DSC_DAT0_RRQ_FIFO_SPC_STALL         [ 5, 4]   0x0
[ 1358.805753] GLBL_DSC_DAT0_FAB_MRKR_RSP_STALL         [ 3, 2]   0x0
[ 1358.805753] GLBL_DSC_DAT0_DSC_OUT_STALL              [ 1, 0]   0x0
[ 1358.805757] GLBL_DSC_DBG_DAT1                        0x274     0x0        0
[ 1358.805757] GLBL_DSC_DAT1_RSVD_1                     [31,28]   0x0
[ 1358.805758] GLBL_DSC_DAT1_EVT_SPC_C2H                [27,22]   0x0
[ 1358.805759] GLBL_DSC_DAT1_EVT_SP_H2C                 [21,16]   0x0
[ 1358.805759] GLBL_DSC_DAT1_DSC_SPC_C2H                [15, 8]   0x0
[ 1358.805760] GLBL_DSC_DAT1_DSC_SPC_H2C                [ 7, 0]   0x0
[ 1358.805763] GLBL_DSC_ERR_LOG2                        0x27c     0x2800280  41943680
[ 1358.805764] GLBL_DSC_ERR_LOG2_OLD_PIDX               [31,16]   0x280
[ 1358.805764] GLBL_DSC_ERR_LOG2_NEW_PIDX               [15, 0]   0x280
[ 1358.805765] eqdma_hw_error_process detected DMA engine error
[ 1358.819350] onic:error_intr_handler: Error IRQ fired on Funtion#0: index=7, vector=272
[ 1358.819357] eqdma_hw_error_process: Global Err Reg(0x248) = 0x4
[ 1358.819359] addr = 0x00000254 val = 0x00100000
[ 1358.819366] GLBL_DSC_ERR_STS                         0x254     0x100000   1048576
[ 1358.819368] GLBL_DSC_ERR_STS_RSVD_1                  [31,26]   0x0
[ 1358.819370] GLBL_DSC_ERR_STS_PORT_ID                 [   25]   0x0
[ 1358.819371] GLBL_DSC_ERR_STS_SBE                     [   24]   0x0
[ 1358.819373] GLBL_DSC_ERR_STS_DBE                     [   23]   0x0
[ 1358.819374] GLBL_DSC_ERR_STS_RQ_CANCEL               [   22]   0x0
[ 1358.819375] GLBL_DSC_ERR_STS_DSC                     [   21]   0x0
[ 1358.819377] GLBL_DSC_ERR_STS_DMA                     [   20]   0x1
[ 1358.819378] GLBL_DSC_ERR_STS_FLR_CANCEL              [   19]   0x0
[ 1358.819379] GLBL_DSC_ERR_STS_RSVD_2                  [18,17]   0x0
[ 1358.819381] GLBL_DSC_ERR_STS_DAT_POISON              [   16]   0x0
[ 1358.819382] GLBL_DSC_ERR_STS_TIMEOUT                 [    9]   0x0
[ 1358.819383] GLBL_DSC_ERR_STS_FLR                     [    8]   0x0
[ 1358.819385] GLBL_DSC_ERR_STS_TAG                     [    6]   0x0
[ 1358.819386] GLBL_DSC_ERR_STS_ADDR                    [    5]   0x0
[ 1358.819387] GLBL_DSC_ERR_STS_PARAM                   [    4]   0x0
[ 1358.819389] GLBL_DSC_ERR_STS_BCNT                    [    3]   0x0
[ 1358.819390] GLBL_DSC_ERR_STS_UR_CA                   [    2]   0x0
[ 1358.819391] GLBL_DSC_ERR_STS_POISON                  [    1]   0x0
[ 1358.819396] GLBL_DSC_ERR_LOG0                        0x25c     0xc0000043 -1073741757
[ 1358.819398] GLBL_DSC_ERR_LOG0_VALID                  [   31]   0x1
[ 1358.819399] GLBL_DSC_ERR_LOG0_SEL                    [   30]   0x1
[ 1358.819401] GLBL_DSC_ERR_LOG0_RSVD_1                 [29,13]   0x0
[ 1358.819402] GLBL_DSC_ERR_LOG0_QID                    [12, 0]   0x43
[ 1358.819407] GLBL_DSC_ERR_LOG1                        0x260     0x280014   2621460
[ 1358.819409] GLBL_DSC_ERR_LOG1_RSVD_1                 [31,28]   0x0
[ 1358.819410] GLBL_DSC_ERR_LOG1_CIDX                   [27,12]   0x280
[ 1358.819412] GLBL_DSC_ERR_LOG1_RSVD_2                 [11, 9]   0x0
[ 1358.819413] GLBL_DSC_ERR_LOG1_SUB_TYPE               [ 8, 5]   0x0
[ 1358.819415] GLBL_DSC_ERR_LOG1_ERR_TYPE               [ 4, 0]   0x14
[ 1358.819419] GLBL_DSC_DBG_DAT0                        0x270     0x0        0
[ 1358.819421] GLBL_DSC_DAT0_RSVD_1                     [31,30]   0x0
[ 1358.819422] GLBL_DSC_DAT0_CTXT_ARB_DIR               [   29]   0x0
[ 1358.819424] GLBL_DSC_DAT0_CTXT_ARB_QID               [28,17]   0x0
[ 1358.819425] GLBL_DSC_DAT0_CTXT_ARB_REQ               [16,12]   0x0
[ 1358.819426] GLBL_DSC_DAT0_IRQ_FIFO_FL                [   11]   0x0
[ 1358.819428] GLBL_DSC_DAT0_TMSTALL                    [   10]   0x0
[ 1358.819429] GLBL_DSC_DAT0_RRQ_STALL                  [ 9, 8]   0x0
[ 1358.819431] GLBL_DSC_DAT0_RCP_FIFO_SPC_STALL         [ 7, 6]   0x0
[ 1358.819432] GLBL_DSC_DAT0_RRQ_FIFO_SPC_STALL         [ 5, 4]   0x0
[ 1358.819434] GLBL_DSC_DAT0_FAB_MRKR_RSP_STALL         [ 3, 2]   0x0
[ 1358.819435] GLBL_DSC_DAT0_DSC_OUT_STALL              [ 1, 0]   0x0
[ 1358.819440] GLBL_DSC_DBG_DAT1                        0x274     0x0        0
[ 1358.819441] GLBL_DSC_DAT1_RSVD_1                     [31,28]   0x0
[ 1358.819443] GLBL_DSC_DAT1_EVT_SPC_C2H                [27,22]   0x0
[ 1358.819444] GLBL_DSC_DAT1_EVT_SP_H2C                 [21,16]   0x0
[ 1358.819446] GLBL_DSC_DAT1_DSC_SPC_C2H                [15, 8]   0x0
[ 1358.819447] GLBL_DSC_DAT1_DSC_SPC_H2C                [ 7, 0]   0x0
[ 1358.819452] GLBL_DSC_ERR_LOG2                        0x27c     0x2800280  41943680
[ 1358.819453] GLBL_DSC_ERR_LOG2_OLD_PIDX               [31,16]   0x280
[ 1358.819455] GLBL_DSC_ERR_LOG2_NEW_PIDX               [15, 0]   0x280
[ 1358.819456] eqdma_hw_error_process detected DMA engine error
[ 1358.833053] onic:error_intr_handler: Error IRQ fired on Funtion#0: index=7, vector=272
[ 1358.833059] eqdma_hw_error_process: Global Err Reg(0x248) = 0x4
[ 1358.833062] addr = 0x00000254 val = 0x00100000
[ 1358.833068] GLBL_DSC_ERR_STS                         0x254     0x100000   1048576
[ 1358.833071] GLBL_DSC_ERR_STS_RSVD_1                  [31,26]   0x0
[ 1358.833072] GLBL_DSC_ERR_STS_PORT_ID                 [   25]   0x0
[ 1358.833074] GLBL_DSC_ERR_STS_SBE                     [   24]   0x0
[ 1358.833075] GLBL_DSC_ERR_STS_DBE                     [   23]   0x0
[ 1358.833077] GLBL_DSC_ERR_STS_RQ_CANCEL               [   22]   0x0
[ 1358.833078] GLBL_DSC_ERR_STS_DSC                     [   21]   0x0
[ 1358.833079] GLBL_DSC_ERR_STS_DMA                     [   20]   0x1
[ 1358.833080] GLBL_DSC_ERR_STS_FLR_CANCEL              [   19]   0x0
[ 1358.833082] GLBL_DSC_ERR_STS_RSVD_2                  [18,17]   0x0
[ 1358.833084] GLBL_DSC_ERR_STS_DAT_POISON              [   16]   0x0
[ 1358.833085] GLBL_DSC_ERR_STS_TIMEOUT                 [    9]   0x0
[ 1358.833086] GLBL_DSC_ERR_STS_FLR                     [    8]   0x0
[ 1358.833088] GLBL_DSC_ERR_STS_TAG                     [    6]   0x0
[ 1358.833089] GLBL_DSC_ERR_STS_ADDR                    [    5]   0x0
[ 1358.833090] GLBL_DSC_ERR_STS_PARAM                   [    4]   0x0
[ 1358.833091] GLBL_DSC_ERR_STS_BCNT                    [    3]   0x0
[ 1358.833093] GLBL_DSC_ERR_STS_UR_CA                   [    2]   0x0
[ 1358.833094] GLBL_DSC_ERR_STS_POISON                  [    1]   0x0
[ 1358.833099] GLBL_DSC_ERR_LOG0                        0x25c     0xc0000043 -1073741757
[ 1358.833101] GLBL_DSC_ERR_LOG0_VALID                  [   31]   0x1
[ 1358.833102] GLBL_DSC_ERR_LOG0_SEL                    [   30]   0x1
[ 1358.833104] GLBL_DSC_ERR_LOG0_RSVD_1                 [29,13]   0x0
[ 1358.833105] GLBL_DSC_ERR_LOG0_QID                    [12, 0]   0x43
[ 1358.833110] GLBL_DSC_ERR_LOG1                        0x260     0x280014   2621460
[ 1358.833112] GLBL_DSC_ERR_LOG1_RSVD_1                 [31,28]   0x0
[ 1358.833113] GLBL_DSC_ERR_LOG1_CIDX                   [27,12]   0x280
[ 1358.833115] GLBL_DSC_ERR_LOG1_RSVD_2                 [11, 9]   0x0
[ 1358.833116] GLBL_DSC_ERR_LOG1_SUB_TYPE               [ 8, 5]   0x0
[ 1358.833118] GLBL_DSC_ERR_LOG1_ERR_TYPE               [ 4, 0]   0x14
[ 1358.833122] GLBL_DSC_DBG_DAT0                        0x270     0x0        0
[ 1358.833124] GLBL_DSC_DAT0_RSVD_1                     [31,30]   0x0
[ 1358.833125] GLBL_DSC_DAT0_CTXT_ARB_DIR               [   29]   0x0
[ 1358.833127] GLBL_DSC_DAT0_CTXT_ARB_QID               [28,17]   0x0
[ 1358.833128] GLBL_DSC_DAT0_CTXT_ARB_REQ               [16,12]   0x0
[ 1358.833130] GLBL_DSC_DAT0_IRQ_FIFO_FL                [   11]   0x0
[ 1358.833131] GLBL_DSC_DAT0_TMSTALL                    [   10]   0x0
[ 1358.833133] GLBL_DSC_DAT0_RRQ_STALL                  [ 9, 8]   0x0
[ 1358.833134] GLBL_DSC_DAT0_RCP_FIFO_SPC_STALL         [ 7, 6]   0x0
[ 1358.833136] GLBL_DSC_DAT0_RRQ_FIFO_SPC_STALL         [ 5, 4]   0x0
[ 1358.833137] GLBL_DSC_DAT0_FAB_MRKR_RSP_STALL         [ 3, 2]   0x0
[ 1358.833139] GLBL_DSC_DAT0_DSC_OUT_STALL              [ 1, 0]   0x0
[ 1358.833143] GLBL_DSC_DBG_DAT1                        0x274     0x0        0
[ 1358.833145] GLBL_DSC_DAT1_RSVD_1                     [31,28]   0x0
[ 1358.833146] GLBL_DSC_DAT1_EVT_SPC_C2H                [27,22]   0x0
[ 1358.833148] GLBL_DSC_DAT1_EVT_SP_H2C                 [21,16]   0x0
[ 1358.833149] GLBL_DSC_DAT1_DSC_SPC_C2H                [15, 8]   0x0
[ 1358.833150] GLBL_DSC_DAT1_DSC_SPC_H2C                [ 7, 0]   0x0
[ 1358.833155] GLBL_DSC_ERR_LOG2                        0x27c     0x2800280  41943680
[ 1358.833157] GLBL_DSC_ERR_LOG2_OLD_PIDX               [31,16]   0x280
[ 1358.833158] GLBL_DSC_ERR_LOG2_NEW_PIDX               [15, 0]   0x280
[ 1358.833160] eqdma_hw_error_process detected DMA engine error
[ 1358.846799] onic:error_intr_handler: Error IRQ fired on Funtion#0: index=7, vector=272
[ 1358.846806] eqdma_hw_error_process: Global Err Reg(0x248) = 0x4
[ 1358.846809] addr = 0x00000254 val = 0x00100000
[ 1358.846815] GLBL_DSC_ERR_STS                         0x254     0x100000   1048576
[ 1358.846817] GLBL_DSC_ERR_STS_RSVD_1                  [31,26]   0x0
[ 1358.846819] GLBL_DSC_ERR_STS_PORT_ID                 [   25]   0x0
[ 1358.846821] GLBL_DSC_ERR_STS_SBE                     [   24]   0x0
[ 1358.846822] GLBL_DSC_ERR_STS_DBE                     [   23]   0x0
[ 1358.846823] GLBL_DSC_ERR_STS_RQ_CANCEL               [   22]   0x0
[ 1358.846825] GLBL_DSC_ERR_STS_DSC                     [   21]   0x0
[ 1358.846826] GLBL_DSC_ERR_STS_DMA                     [   20]   0x1
[ 1358.846827] GLBL_DSC_ERR_STS_FLR_CANCEL              [   19]   0x0
[ 1358.846829] GLBL_DSC_ERR_STS_RSVD_2                  [18,17]   0x0
[ 1358.846830] GLBL_DSC_ERR_STS_DAT_POISON              [   16]   0x0
[ 1358.846832] GLBL_DSC_ERR_STS_TIMEOUT                 [    9]   0x0
[ 1358.846833] GLBL_DSC_ERR_STS_FLR                     [    8]   0x0
[ 1358.846834] GLBL_DSC_ERR_STS_TAG                     [    6]   0x0
[ 1358.846859] GLBL_DSC_ERR_STS_ADDR                    [    5]   0x0
[ 1358.846860] GLBL_DSC_ERR_STS_PARAM                   [    4]   0x0
[ 1358.846860] GLBL_DSC_ERR_STS_BCNT                    [    3]   0x0
[ 1358.846861] GLBL_DSC_ERR_STS_UR_CA                   [    2]   0x0
[ 1358.846861] GLBL_DSC_ERR_STS_POISON                  [    1]   0x0
[ 1358.846865] GLBL_DSC_ERR_LOG0                        0x25c     0xc0000043 -1073741757
[ 1358.846866] GLBL_DSC_ERR_LOG0_VALID                  [   31]   0x1
[ 1358.846867] GLBL_DSC_ERR_LOG0_SEL                    [   30]   0x1
[ 1358.846867] GLBL_DSC_ERR_LOG0_RSVD_1                 [29,13]   0x0
[ 1358.846868] GLBL_DSC_ERR_LOG0_QID                    [12, 0]   0x43
[ 1358.846871] GLBL_DSC_ERR_LOG1                        0x260     0x280014   2621460
[ 1358.846872] GLBL_DSC_ERR_LOG1_RSVD_1                 [31,28]   0x0
[ 1358.846872] GLBL_DSC_ERR_LOG1_CIDX                   [27,12]   0x280
[ 1358.846873] GLBL_DSC_ERR_LOG1_RSVD_2                 [11, 9]   0x0
[ 1358.846873] GLBL_DSC_ERR_LOG1_SUB_TYPE               [ 8, 5]   0x0
[ 1358.846874] GLBL_DSC_ERR_LOG1_ERR_TYPE               [ 4, 0]   0x14
[ 1358.846878] GLBL_DSC_DBG_DAT0                        0x270     0x0        0
[ 1358.846878] GLBL_DSC_DAT0_RSVD_1                     [31,30]   0x0
[ 1358.846879] GLBL_DSC_DAT0_CTXT_ARB_DIR               [   29]   0x0
[ 1358.846879] GLBL_DSC_DAT0_CTXT_ARB_QID               [28,17]   0x0
[ 1358.846880] GLBL_DSC_DAT0_CTXT_ARB_REQ               [16,12]   0x0
[ 1358.846881] GLBL_DSC_DAT0_IRQ_FIFO_FL                [   11]   0x0
[ 1358.846881] GLBL_DSC_DAT0_TMSTALL                    [   10]   0x0
[ 1358.846881] GLBL_DSC_DAT0_RRQ_STALL                  [ 9, 8]   0x0
[ 1358.846882] GLBL_DSC_DAT0_RCP_FIFO_SPC_STALL         [ 7, 6]   0x0
[ 1358.846883] GLBL_DSC_DAT0_RRQ_FIFO_SPC_STALL         [ 5, 4]   0x0
[ 1358.846883] GLBL_DSC_DAT0_FAB_MRKR_RSP_STALL         [ 3, 2]   0x0
[ 1358.846884] GLBL_DSC_DAT0_DSC_OUT_STALL              [ 1, 0]   0x0
[ 1358.846887] GLBL_DSC_DBG_DAT1                        0x274     0x0        0
[ 1358.846888] GLBL_DSC_DAT1_RSVD_1                     [31,28]   0x0
[ 1358.846888] GLBL_DSC_DAT1_EVT_SPC_C2H                [27,22]   0x0
[ 1358.846889] GLBL_DSC_DAT1_EVT_SP_H2C                 [21,16]   0x0
[ 1358.846889] GLBL_DSC_DAT1_DSC_SPC_C2H                [15, 8]   0x0
[ 1358.846890] GLBL_DSC_DAT1_DSC_SPC_H2C                [ 7, 0]   0x0
[ 1358.846893] GLBL_DSC_ERR_LOG2                        0x27c     0x2800280  41943680
[ 1358.846894] GLBL_DSC_ERR_LOG2_OLD_PIDX               [31,16]   0x280
[ 1358.846894] GLBL_DSC_ERR_LOG2_NEW_PIDX               [15, 0]   0x280
[ 1358.846895] eqdma_hw_error_process detected DMA engine error
[ 1369.075104] onic:qdma_request_wait_for_cmpl: qdma3b000-MM-67: req 0x000000004a0a43c4, R,4190208,0/65536000,0x0, done 0, err 0, tm 10000.
[ 1369.087350] onic:qdma_descq_dump: qdma3b000-MM-67: 0x43/0x43, desc sz 1024/0, pidx 639, cidx 640
[ 1369.087674] onic 0000:3b:00.0: reconic-mm: Close onic_cdev.
zhguanw-amd commented 7 months ago

I push a fix for that seg fault. You can get a new version network_systolic_mm