erpc-io / eRPC

Efficient RPCs for datacenter networks
https://erpc.io/
Other
835 stars 137 forks source link

small_rpc_tput app, assert(pkthdr->req_num_ == sslot->cur_req_num_ + kSessionReqWindow) fails! #70

Closed AliAbyaneh closed 2 years ago

AliAbyaneh commented 2 years ago

I have been running the small_rpc_tput application in debug mode. However, this assertion fails: https://github.com/erpc-io/eRPC/blob/3387f8dcad742d01c7664182483b972d18d546f5/src/rpc_impl/rpc_req.cc#L111

Another question is, is it ok to use an eRPC module both as a server and a client?

anujkaliaiitd commented 2 years ago

Thanks for raising the issue. Could you please describe your usecase more, or (even better) create a minimal example that reproduces the issue?

Yes, using an erpc::Rpc object as both a client and server is fine.

AliAbyaneh commented 2 years ago

Thank you for your response.

I'm running the small_rpc_app from the eRPC apps. I have changed the names to be able to compile that code.

I have changed the config file:

--test_ms 20000 --sm_verbose 0 --batch_size 1 --concurrency 60 --msg_size 256 --num_processes 2 --num_threads 4 --numa_0_ports 0 --numa_1_ports 1,3

And the autorun_process_file is:

192.168.0.223 31850 0 192.168.0.223 31851 0

I'm using the Infiniband transport layer. Also, I have two-port 100G Mellanox NICs. The driver is MLNX_OFED_LINUX-5.4-3.0.3.0. The os is Ubuntu 20.04 LTS.

Here is how the ibstat looks like:

CA 'mlx5_0' CA type: MT4119 Number of ports: 1 Firmware version: 16.31.2006 Hardware version: 0 Node GUID: xxxxxx System image GUID: xxxxxx Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 7 LMC: 0 SM lid: 4 Capability mask: 0x2651e84a Port GUID: xxxxxx Link layer: InfiniBand CA 'mlx5_1' CA type: MT4119 Number of ports: 1 Firmware version: 16.31.2006 Hardware version: 0 Node GUID: xxxxx System image GUID: xxxxxx Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 8 LMC: 0 SM lid: 4 Capability mask: 0x2651e848 Port GUID: xxxxx Link layer: InfiniBand

Moreover, I have increased the kSessionReqWindow to 64. The problem does not happen with kSessionReqWindow equal to 8. I only have two NUMA nodes, and I allocate 16348 huge pages per NUMA node.

I will try to create a simpler example that reproduces the error.

anujkaliaiitd commented 2 years ago

Thanks for the details. If it happens with small_rpc_tput, it seems like it could be a bug in eRPC. Could you perhaps try the following:

AliAbyaneh commented 2 years ago

Unfortunately, I cannot reproduce the error with traces on. With traces enabled, the throughput drops significantly. My guess is the application does not crash unless it is under heavy load.

anujkaliaiitd commented 2 years ago

Closing because it seems we can't reproduce this issue.