Derecho-Project / derecho

The main code repository for the Derecho project.
BSD 3-Clause "New" or "Revised" License
186 stars 47 forks source link

Implement timeout in SST and RDMC connection setup #24

Open sagarjha opened 6 years ago

sagarjha commented 6 years ago

Currently, nodes hang if they are waiting to receive out of band data to create RDMA connections for SST and RDMC. A failure of a node during this connection setup will lead to a violation of progress. This has been discussed in detail before but we never got around to implementing timeouts.

etremel commented 6 years ago

As a matter of fact, I had to write connect-with-timeout methods for the TCP sockets in order to get the full-restart code working. You may find commit 5b5b3e5b5be6b1713f0d59cece163b98f915195b helpful.