Derecho-Project / derecho

The main code repository for the Derecho project.
BSD 3-Clause "New" or "Revised" License
187 stars 47 forks source link

Modify our external-client initial binding logic #212

Open KenBirman opened 3 years ago

KenBirman commented 3 years ago

Based on Weijia's experiments, we know that something about the initial binding is currently causing external clients to run very slowly -- perhaps they all currently bind to member 0 of the top-level view, for example, due to the way we advertise group "contacts" in the config file. But he found that if he simply connects, does some stuff, then disconnects and resumes his experiments, performance is far higher.

This has me thinking we could make that behavior a standard feature, because in some ways, an initial reconnect makes sense no matter what:

So, with these in mind, I want to propose a really simple API change that Weijia could implement in half an hour.

For example, in Alicia's code, an external member who is a long-lived IoT host, like the server on Julio's farm, would know about the affinity keys or object keys associated with those IoT devices. It anticipates doing put on those keys. So, it could map the affinity key to a specific shard, then pick a random shard member as the proxy. Her put and trigger operations would run a lot faster because they won't need to be relayed.

etremel commented 3 years ago

I didn't originally write the ExternalGroup code, but I'm confident that any member of the group can handle external clients equally well. If a non-leader node receives a join request that specifies it is an "external client request," it simply adds that client to its set of external P2P connections. The external client does need to initially contact a "well-known" node, typically the group leader, in order to request the current View and discover the other members, but once it does that it can choose any node to send its RPC requests to. In fact, it's up to the user-level code that constructs and uses the ExternalGroup to choose which group member it uses as its proxy, since the p2p_send method must be called with a "destination node ID" argument.

KenBirman commented 3 years ago

Edward, do we end up with one TCP connection per node the external client interacts with? Or one connection to a proxy, who then relays the message?

etremel commented 3 years ago

The external client maintains one connection per node it interacts with, and in fact, they can be RDMA connections if possible. ExternalGroup uses the same P2PConnectionManager object as RPCManager, which sets up a mini-SST with each remote node to deliver messages. This SST will use either TCP or RDMA depending on how LibFabrics is configured on the external client. (The main difference between ExternalGroup and RPCManager is that ExternalGroup only sets up a P2PConnection with a node when it actually wants to send a message to it, whereas RPCManager sets up a P2PConnection with every member in the group as soon as it starts up).