Open JackyWoo opened 1 year ago
There is already a timeout logic that calls when_done
with failure, so as to make peer.cxx
close the RPC client (hence the socket and connection). Do you mean this logic did not work as explained?
https://github.com/eBay/NuRaft/blob/728644d0e1daa5d918c569e8f5a9486c47b64a0d/src/asio_service.cxx#L1028-L1040
As my understanding the timer here is set to wait other thread connect and send data and when time is up it will retry send.
And I find socket connect has no timout:
I am not familar with NuRaft RPC and please correct me if I am wrong.
Ok got it, the existing timeout logic does not work for auto forwarding as there is no retry for auto forwarding. Need more time to think about it in detail, to unify it with existing timeout logic. Thanks.
In my project thread was stuck by NuRaft auto forwarding RPC connect method.
How to reproduce?
1.Thread node n1, n2, n3 and let's assume that n3 is leader. 2.Make network partition separating n3 with n1, n2 by iptables.
PS: you can increase OS option
net.ipv4.tcp_syn_retries
to increase system max connect timeout.details see https://github.com/JDRaftKeeper/RaftKeeper/issues/42
Solution
I think we should add connection timeout.