erlang / otp

Erlang/OTP
http://erlang.org
Apache License 2.0
11.43k stars 2.96k forks source link

ERL-515: Make built-in distributed applications set up cloud-ready (recover from netsplit) #3365

Open OTP-Maintainer opened 7 years ago

OTP-Maintainer commented 7 years ago

Original reporter: hubertlepicki Affected version: Not Specified Component: kernel Migrated from: https://bugs.erlang.org/browse/ERL-515


I have been considering using distributed application mechanism for a project, however it seems very unlikely we use this feature. I bet we are not alone, since it's got a major flaw preventing the adoption on cloud-based set ups: it does not recover at all from network splits.

This is, in fact, well known behavior. See:
http://erlang.org/pipermail/erlang-questions/2014-November/081791.html
and
http://learnyousomeerlang.com/distributed-otp-applications states
"If you deem netsplits more likely than hardware failures, then you have to be aware of the possibility that the application is running both as a backup and main one, and that funny things could happen when the network issue is resolved. Maybe distributed OTP applications aren't the right mechanism for you in these cases."

I have ran some tests, and it is precisely what happens indeed. The takeover does not take effect - contrary to situation when one node goes down and then is started again.

While netsplit may be rare when you run cluster on co-located servers, sitting in the same data center, it is more likely when you have a cloud deployment. Considering how popular cloud deployments are these days, this renders Erlang's built in mechanism for distributed applications unsuitable for these cloud based scenarios.

I propose there's either an update to existing dist_ac module needed, or a second version of such controller is required to allow people setting up fairly simple clusters of Erlang nodes, with failover and takeover, which recovers from instances of network disconnections.
OTP-Maintainer commented 6 years ago

kenneth said:

Would be good if could get more specific suggestions in terms of pointers to code or behavior or even better if we could get code contributions.