Open tclinken opened 4 years ago
This is particularly interesting in cloud environments, where fault domains/ availability zones are prevalent. And running FDB across availability zones can improve availability.
Since you can choose different host types in cloud, it will be better if DBA can specify a list of powerful hosts as the candidate to host those roles. When one host is down, CC can quickly choose the second one from the list.
Are there other potential optimizations enabled by this configuration, such as communicating via shared memory instead of sockets for processes on the all-in-one host?
@ryanworl Yes, I think that could likely cause a significant performance improvement in the future. Preliminary benchmarking shows that there's around a 40 microsecond minimum latency for IPC currently. In the cloud environment it's most crucial that we're avoiding network hops across separate AZs, because this will save us hundreds of microseconds, but communicating via shared memory within a single host could also yield significant improvements.
With clients and transaction logs distributed across multiple hosts, the only way to guarantee at most 2 network round trips in the synchronous commit path is to run all of the master, resolver, and proxy roles on the same host. However, if this host dies, the current role recruitment may recruit these roles on different hosts. To guarantee low commit latency at all times, it would be nice to have a knob to guarantee that all master, proxy, and resolver roles are recruited on a single host with each recovery.