Optimistic Randomness for ABA

amiller commented 6 years ago

tl;dr: we could cut down on the threshold signature cryptography Common Coins in the ABA protocol by using "forced" common coin values in the optimistic case.

The following suggestion comes from Christian Cachin:

one should take the approach somewhere described by Klaus Kursawe, trying some "optimistic" randomness first: every nodes has a seed programmed in and the first 10 random bits are taken from a PRG on this -- unless the network knows how to interpret the PRG, this will do the job! It's an assumption a bit like the random oracle model for cryptography.

In other words, the common coin cryptography is intended to handle a very extreme attacker scenario, where the attacker learns the common coin value, and equivocates at least once, and manipulates the network delivery to selectively delay convergence. The suggested coin schedule is basically: [prg(), prg(), ... prg(), coin(), coin(), ...], where prg() means a coin that is random but predictable, like the hash of the current round number, and coin() is a real threshold common coin. Note that although this is optimism, it is optimism about adversarial network manipulation rather than optimism about timing or about a leader node like in PBFT, hence still within HoneyBadger ethos :)

Another related suggestion comes from Micali's "Byzantine Agreement, Made Trivial" paper. Although this protocol is intended for the Synchronous setting, it contains a nice insight about how to handle termination. Recall that in ABA, it can be the case that some node observes agreement on v in round r, but then we need to wait until the next round r'>r where coin(r')=v for other honest nodes to reach agreement. This is a bit inelegant since ABA instances may have to remaining lingering around even after terminating. We could "fire and forget" our shares of a few of the remaining common coins, although we wouldn't know exactly how many it will take! Micali's suggestion is to draw coins in the order: [0, 1, coin(), 0, 1, coin(), 0, 1, coin(), ...] Another suggestion is to have a special message, v*, which is basically an abbreviation for BVAL(v) and AUX({v}) for all subsequent rounds. This guarantees at most one more common coin needs to be sent along with v*.

Another observation on the choice of coin schedule: Most of the ABA instances need to settle on 1, in fact N-f of them must do so before any 0s can be input. So we can optimize for the typical case by starting with [1,1,...]. Furthermore, in the case of a benign crash of some nodes, the corresponding ABA instances will receive input 0 for everyone. So [1,1,0,0,...] should resolve most cases in 4 iterations.

So, I'd propose a coin schedule that looks like the following: Option 1: [1,1,0,0,prg(),prg(),prg(),prg(),1,0,coin(),1,0,coin(),...] Option 2: [1,1,0,0,coin(),1,0,coin(),...]

Both of these have the qualities:

the most common case finishes as fast as possible,
the first few coins are predictable but have a variety of 0 and 1
If any node observes agreement during the first few, then it's safe to terminate after sending v* without even computing a common coin share
In any case, after observing agreement it's safe to terminate after sending at most one additional common coin share Overall I'd prefer option 2 here since it makes it easier to create test scenarios that exercise the common coin.

Also note: the fix of adding an additional CONF message per #59 is only necessary for those rounds involving coin(), since the purpose of CONF is to prevent a network manipulation adversary from predicting the coin value before influencing the estimates of the nodes, so it's redundant when the value is predictable anyway.

Vagabond commented 6 years ago

When you say 'coin schedule' do you mean (for option 2) instead of calling get_coin() and modulus the result by 2, we'd act like round 1 had a coin flip modulus 2 result of 1, round 2 also had a coin flip modulus 2 result of 1, round three had a coin flip modulus 2 result of 0... and round 5 we actually ran the common coin protocol?

In addition, when you have those trailing ...s in the options, are the elided values coin() or a repetition of the pattern?

amiller commented 6 years ago

@Vagabond yes your description is exactly what I mean by "coin schedule". The trailing ...s in the options mean repetitions of the pattern 1,0,coin()

Vagabond commented 6 years ago

Yeah, this seems really interesting. I like it.

Just to confirm: the idea here is that in the usual case, when an attacker is not actively attacking ABA, a fixed coin schedule allows us to reach agreement faster with the fallback to the heavyweight common coin if we exceed 4 rounds (which is the expected number needed to converge)?

Does this weaken security or just allow an adversary, if they are present, to cause ABA to take more rounds to complete?

ChronusZ commented 6 years ago

I believe you only need to do the very first round deterministic 1, since if every honest node begins by voting 1 then every node is guaranteed to always have bin_vals={1} and hence terminate on the first round. Similarly you should only need one round of deterministic 0 to handle the benign crashed nodes.

I personally quite like the prf idea. You could even adjust the number of prf rounds dynamically to minimize the average number of message rounds until termination, so that in the presence of an active network adversary the prf rounds decrease, and in absence they increase.

amiller commented 6 years ago

@ChronusZ even if every node starts by inputting 1, and therefore every honest node sees bin_values={1}, it is still necessary to continue participating in the protocol until another coin returns 1. The reason is that given the local view of an honest party bin_values={1} in this scenario, there is a different possible world where other honest parties see bin_values={0,1} and hence they need to see another 1 coin before deciding.

ChronusZ commented 6 years ago

@amiller It doesn't really matter if the actual process needs to continue longer after you've already produced an output right? Continuing longer only parallelizes with the rest of the protocol with no cost to latency.

By the way, another point worth mentioning is that a simple modification to the CONF round turns it into an RBC READY-type round which can also be used to fix this. More specifically, you add the logic:

Upon receiving t+1 CONF(r,{v}) messages, broadcast CONF(r,{v}).
Upon receiving n-t CONF(r,{v}) messages, terminate and ouput v if v equals the common coin value, even if we've already progressed past round r.

This way if you terminate in round r, by the reliability of the CONF message you know everyone else can eventually terminate on round r, and by the ABBA properties you know that no one will terminate on a different value even in a future round.

Also, one more optimization that occurs to me is that by making the first round deterministically flip to 1, for any round r>1 all instances can get to the end of round r independently of any other other instance. Thus you can share the coins for each round between all instances by waiting until every single instance has gotten to the end of round r before flipping the coin for round r, and this won't introduce any issues for liveness.

ChronusZ commented 6 years ago

Alternatively, you could just add a READY-type message at the very end, and after outputting a value in some round only continue to the next round if t+1 nodes send a message for the next round. Then if t+1 honest nodes output in round r everyone can eventually receiving enough READY's to terminate, and otherwise everyone receives enough messages from the next round to participate in the next round themselves.

amiller commented 6 years ago

@amiller It doesn't really matter if the actual process needs to continue longer after you've already produced an output right? Continuing longer only parallelizes with the rest of the protocol with no cost to latency.

Right, it's OK if the actual process continues longer even after output is delivered, it is just a little bit inelegant. In our Go implementation we've found it's easier to code to be able to clean up memory by only having one active at a time.

Thus you can share the coins for each round between all instances by waiting until every single instance has gotten to the end of round r before flipping the coin for round r, and this won't introduce any issues for liveness.

This is a really interesting optimization. If we could share common coins between instances it could cut down on the worst-case threshold crypto bottleneck.

Indeed the main reason why we did not try to share the common coins between instances is because of a deadlock problem.. we need to be able to proceed with some ABA's that receive input 1 and complete before providing input 0 to any other ABAs. So if we wait until all instances are ready to proceed with the coin, then we would get stuck. Your observation is that by forcing the first coin to be 1 we are already removing that deadlock. We are guaranteeing that at least N-t of the ABA instances reach agreement on 1 in the first round. After this point, honest parties will be able to provide input to all the remaining ABA instances as well. This is cool, I think reduces the worst-case expected communications per node of N instances of ABA from O(N^2 lambda) to O(N^2 + N lambda)

afck commented 6 years ago

I'm probably missing something, but I was wondering whether some kind of special v*/termination message is necessary anyway, to guarantee that all instances terminate (not for output, of course, but for termination)?

Let's say the f faulty nodes just never send any messages, one node, Alice, decides and outputs :zero: in round 1, and the coin shows :zero: in round 2 again. Then Alice will terminate in round 2 and honest Bob will output in round 2. But Bob must continue until the coin comes up :zero: again. He will be stuck in round 3, where he won't receive more than N - f - 1 Aux messages.

amiller / HoneyBadgerBFT

Optimistic Randomness for ABA #63