Currently, nodes hang if they are waiting to receive out of band data to create RDMA connections for SST and RDMC. A failure of a node during this connection setup will lead to a violation of progress.
This has been discussed in detail before but we never got around to implementing timeouts.
As a matter of fact, I had to write connect-with-timeout methods for the TCP sockets in order to get the full-restart code working. You may find commit 5b5b3e5b5be6b1713f0d59cece163b98f915195b helpful.
Currently, nodes hang if they are waiting to receive out of band data to create RDMA connections for SST and RDMC. A failure of a node during this connection setup will lead to a violation of progress. This has been discussed in detail before but we never got around to implementing timeouts.