YahooArchive / coname

An experimental cooperative keyserver based on ideas from dename.
Apache License 2.0
107 stars 20 forks source link

reconfiguration mess #64

Open xiang90 opened 8 years ago

xiang90 commented 8 years ago

In the reconfiguration mess doc, it mentioned a few things that are not exactly true.

etcd/raft requires that a new replica being added must know the exact state of the cluster at the moment it is added. Similarly, replicas who are not yet aware of a recent reconfigurations are not able to receive commands from the new nodes: this means that a new node serving as a leader cannot help those replicas to catch up. Nodes not in the cluster cannot catch up with the cluster before being added -- and adding them would reduce availability.

etcd/raft does not have this requirement. You can just add node and starts that node with no configuration at all. Also replicas can receives commands from leader even if it does not know the recent configuration.

The truth is that in etcd we add additional stricter checking which are necessary in our use case. You do not have to do any checking if you do not willing to.

xiang90 commented 8 years ago

Basically this doc describes what etcd does not what etcd/raft does. As far as I know, coname depends on etcd/raft not etcd. So the decisions we made in etcd should not affect coname at all.

andres-erbsen commented 8 years ago

Okay. I agree that the "reconfiguration mess" document is inaccurate. My apologies for confusing limitations of my understanding of etcd/raft with limitations of the implementation itself. And the availability failure referenced in the doc was indeed fixed a while ago.

As for how to resolve this, I think the best solution would be to have etcd/raft documentation include a precise specification about what one needs to ensure to make sure cluster membership changes are safe. I would particularly like to see explicit promises (or disclaimers) about the following scenarios: