fpgasystems / Coyote

Framework providing operating system abstractions and a range of shared networking (RDMA, TCP/IP) and memory services to common modern heterogeneous platforms.
MIT License
207 stars 62 forks source link

rdma test failed #33

Closed crizy closed 1 year ago

crizy commented 1 year ago

@d-kor hi, I tested RDMA_PERF, but it failed, indicating that it cannot connect.

I successfully built the build_perf_rdma_host_hw and build_perf_rdma_card_hw projects
I successfully insmod driver and compiled build_perf_rdma_sw. 

My testing method is as follows,
host0:  fpga0 with build_perf_rdma_host_hw bit    IP:192.168.0.4
host1:  fpga1 with build_perf_rdma_card_hw bit    IP:192.168.0.5
I ping 192.168.0.4 and 192.168.0.5 is ok   

I execute the following  build_perf_rdma_sw,
on host0:   sudo ./build_perf_rdma_sw/main  --reps 100  --mins 128  --maxs 2048
on host1:   sudo ./build_perf_rdma_sw/main  --tcpaddr 192.168.0.4  --reps 100  --mins 128  --maxs 2048

The information displayed after executing the application build_perf_rdma_sw is as follows
on host0:
                -- PARAMS
                -----------------------------------------------
                IBV IP address: 192.168.0.4
                Number of allocated pages: 1
                Read operation
                Min size: 128
                Max size: 2048
                Number of reps: 100
                Queue pair created, qpid: 0
                Master side exchange started ...

on host1 :

                -- PARAMS
                -----------------------------------------------
                TCP master IP address: 192.168.0.4
                IBV IP address: 192.168.0.5
                Number of allocated pages: 1
                Read operation
                Min size: 128
                Max size: 2048
                Number of reps: 100
                Queue pair created, qpid: 0
                Slave side exchange started ...
                terminate called after throwing an instance of 'std::runtime_error'
                         what():  Could not connect to master: 192.168.0.4:18488
                Aborted

The information displayed after terminating the application build_perf_rdma_sw on host0 as follows

                -- PARAMS
                -----------------------------------------------
                IBV IP address: 192.168.0.4
                Number of allocated pages: 1
                Read operation
                Min size: 128
                Max size: 2048
                Number of reps: 100
                Queue pair created, qpid: 0
                Master side exchange started ...
                ^Cterminate called after throwing an instance of 'std::runtime_error'
                what():  Accept failed
mjasny commented 1 year ago

Hi @crizy ,

try to set the tcpaddr to the IP of the Host interface where the SW process is running and not the FPGA interface. The TCP Connection that is used to exchange the Queue Information is not running on the FPGA itself. Also I think you need to use the same bitstream for both fpgas.

crizy commented 1 year ago

Hi @mjasny ,

Thank you very much for your answer. now rdma perf test has preliminarily passed, but there's a problem here.

After passing my first test, I followed closely with the second test which the operation was the same as the first test,the second test was blocked, throughout the process, I ping two ibv ip are ok. the dmesg Informatics of host0 and host1 as follows, [ 6393.549366] fpga_open():fpga device 0 acquired [ 6393.549375] fpga_ioctl():registration succeeded pid 16356, cpid 0 [ 6393.549378] fpga_ioctl():reading config 0x1000100010023 [ 6393.549385] fpga_mmap():fpga dev. 0, memory mapping config AVX region at 382fc1000000 of size 40000 [ 6393.549400] fpga_mmap():fpga dev. 0, memory mapping user ctrl region at 382fc0120000 of size 10000 [ 6393.549490] alloc_user_buffers():allocated 8 bytes for page pointer array for 1 user host buffers @0x000000009e45eb95. [ 6393.549500] alloc_user_buffers():user host buffer allocated @ a27a00000 device 0 [ 6393.549503] alloc_user_buffers():allocated 8 bytes for page pointer array for 1 user card buffers @0x0000000048138c55. [ 6393.549505] card_alloc():user card buffer allocated @ 40000000 device 0 [ 6393.549508] fpga_ioctl():buff_num 1, arg 7ffcad2427c0 [ 6393.549512] fpga_mmap():fpga dev. 0, memory mapping buffer [ 6393.549784] tlb_create_map():creating new TLB entry, vaddr 7fe73ee00000, phost a27a00000, pcard 40000000, cpid 0, hugepage 1 [ 6393.550776] fpga_ioctl():writing qp context ... [ 6393.550795] fpga_ioctl():writing qp connection ...

the first pass test's dmesg Informatics of host0 and host1 as follows, [ 6393.549366] fpga_open():fpga device 0 acquired [ 6393.549375] fpga_ioctl():registration succeeded pid 16356, cpid 0 [ 6393.549378] fpga_ioctl():reading config 0x1000100010023 [ 6393.549385] fpga_mmap():fpga dev. 0, memory mapping config AVX region at 382fc1000000 of size 40000 [ 6393.549400] fpga_mmap():fpga dev. 0, memory mapping user ctrl region at 382fc0120000 of size 10000 [ 6393.549490] alloc_user_buffers():allocated 8 bytes for page pointer array for 1 user host buffers @0x000000009e45eb95. [ 6393.549500] alloc_user_buffers():user host buffer allocated @ a27a00000 device 0 [ 6393.549503] alloc_user_buffers():allocated 8 bytes for page pointer array for 1 user card buffers @0x0000000048138c55. [ 6393.549505] card_alloc():user card buffer allocated @ 40000000 device 0 [ 6393.549508] fpga_ioctl():buff_num 1, arg 7ffcad2427c0 [ 6393.549512] fpga_mmap():fpga dev. 0, memory mapping buffer [ 6393.549784] tlb_create_map():creating new TLB entry, vaddr 7fe73ee00000, phost a27a00000, pcard 40000000, cpid 0, hugepage 1 [ 6393.550776] fpga_ioctl():writing qp context ... [ 6393.550795] fpga_ioctl():writing qp connection ... [ 6393.550799] fpga_ioctl():arp lookup qsfp0, target ip c0a80004/05 [ 6393.556242] fpga_ioctl():unregistration succeeded cpid 0 [ 6393.556292] tlb_create_unmap():unmapping TLB entry, vaddr 7fe73ee00000, cpid 0, hugepage 1 [ 6393.556294] fpga_ioctl():user buffers freed [ 6393.556354] fpga_release():fpga device 0 released