Chain reorg dangers and best practices

AdamISZ / CoinSwapCS

Simple implementation of Bitcoin CoinSwap, client-server

GNU General Public License v3.0

31 stars 14 forks source link

Chain reorg dangers and best practices #22

Open fivepiece opened 7 years ago

fivepiece commented 7 years ago

I tried to go over the protocol and insert bullets in between phases where a chain reorg might happen. The naming and numbering can be seen here for Alice and here for Carol

Alice:

stages 0..6 - no reorg danger

stages 7..9

if TX1 is reorged at this point, Alice must wait until L0 passes to back out from TX2 (key_2_2_CB_0)

stage 10

if TX1 is reorged after the secret was revealed, then Alice stands to lose all coins in TX2 which is redeemable by Carol with no lock time.
if L1 is approaching and TX5sig was not received, then Alice must back out from TX3 (key_2_2_AC_1) before L1 and using the secret X or else Carol might redeem both TX2 and TX3

stage 11..12

if TX5 is reorged and L1 is close, then Alice must rebroadcast TX5 as soon as possible, and probably add fees using CPFP

Carol:

stages 0..4 - no reorg danger

stages 5..6

if TX0 is reorged, Carol must rebroadcast TX0 and TX2 as soon as possible, else Alice can build TX3 and redeem from it without consequence

stages 7..9

no reorg danger(?)

stage 10

if TX4 is reorged, Carol must rebroadcast it as soon as possible. if L0 is approaching and TX0 is unspent, Alice can build TX2 and redeem from it.

This more or less brings me back to wanting a large duration wait time between TX0 and TX1 are confirmed, and until X is revealed. Thoughts? Might be missing something on Carol's stages 7 to 9, I couldn't immediately see a reorg issue there.

AdamISZ commented 7 years ago

Comments for Alice side:

stage 10: if TX1 is reorged after the secret was revealed, then Alice stands to lose all coins in TX2 which is redeemable by Carol with no lock time.

Yep. TX1 must be "final" to whatever extent Alice deems appropriate before proceeding. So it's a matter of what's set in this config var. It's on 2 for now, for testing, and is negotiated between client and server (they use the same). I think a healthy balance might be anywhere from 2-6 for people wanting to do "somewhat realtime coinswaps". I don't think that is unreasonable risk (i.e. it's negligible) for the likely usage scenarios.

if L1 is approaching and TX5sig was not received, then Alice must back out from TX3 ( key_2_2_AC_1 ) before L1 and using the secret X or else Carol might redeem both TX2 and TX3

Currently we have timeouts in seconds on response from the other side. Backout is aggressive in that sense: any failure to update state (e.g. network dropout) is assumed to be malicious and we go into backout mode. I think the current default is 20 seconds. Since the cooperative flow would involve very short delays of this magnitude + the TX0/1 confirmation wait (2-6 blocks) and the L1 might be ~ 50 blocks, the intent is to stay well ahead of L1. If a user leaves it late to backout/recover and ends up near L1, it's a similar (but not nearly as bad) situation to if they left it until after L1, when they would no longer be safe. Basically it means the danger zone "smears" backwards to a few blocks before L1.

Carol side:

I think largely the same comments apply. But for sure it's useful to keep looking at this.

fivepiece commented 7 years ago

One interesting point brought up on irc is : <belcher_> a malicious miner can steal from everyone in the coinswap order book at the same time for the same cost

So wrt 2-6 blocks before X reveal, and if a miner decides to defraud the whole (or most of the) orderbook at once, 6 blocks might still be worthwhile to try and revert. If L1 is 100 blocks, and we agree to only reveal X 50 blocks before L1, we will still have a lot of time to complete the coinswap and not worry about the initial funds being reorged.

100 is just a guess for L1, maybe 50\25 is also okay. It depends partly on the total amount of coins in the order book.

AdamISZ commented 7 years ago

Yes, it certainly was a good point, assuming the postulated malicious miner can arrange to do lots of simultaneous swaps at the same time (should be easy) and amass enough hashpower (should be incredibly hard), their stealing effort could be effective via amplification. But tbh in that scenario you could postulate a lot of other similar attacks they could do (any kind of double spend attack against any service offering X for bitcoins, for example - an altcoin exchange?).

If Coinswap ever reached a stage where this attack was even remotely likely, then yeah just making the confirmation and timelock parameters more conservative would be in order. Like I've said before, I think it makes sense to make them conservative mostly for practicality reasons - so a user can recover even if away from the computer for a short while (edit: and also to allow them to use lower tx fees) - with these extremely difficult attacks being a secondary reason.

fivepiece commented 7 years ago

Understood. In that case I would feel a bit safer with a number closer to 6 than to 2 (wrt TX0\TX1 depth), 6 being the "recommended" acceptable depth for a normal transaction (even by most exchanges I think?).. that might be enough to mitigate such an attack.

AdamISZ commented 7 years ago

Yeah, as for defaults, I think 6 might be reasonable. People seem to argue about this (is there any data on random re-orgs?).

fivepiece commented 7 years ago

I don't have a source, but guessing that it's very uncommon for anything larger than 1 block. Recently bitfury had a block orphaned, iirc. Would be interesting to look at real data

chris-belcher commented 7 years ago

Something I've mentioned on IRC and is worth writing down here too. Today's ASICs are typically underclocked to result in the most power efficiency per unit hashpower. But they can be temporarily overclocked to give them more hash power. So a mining cluster with 20% of hash power could become 45% for a short while. This means that many more confirmations are required to be safe from a miner attack. (I've heard 30-40, remember the probability of success goes down exponentially)

The old "6 confirmations = safe" rule is not valid today. It was based on (I don't remember the exact numbers) <0.1% probability of re-org given that no miner coalition has >1% hash power. These assumptions are completely wrong today even without the overclocking thing. I think it's best we just completely remove the "6 confirm" number from our minds. Stick with 2 confirms for testing yes, no need to make it large when there's no money involved.

Random re-orgs are not the issue here. We're worried about re-orgs that miners do intentionally.

We could use the maths from Satoshi's whitepaper (or the corrected maths from here) to figure out the probability of success of a malicious miner, and how many confirms we need to wait.

Large miners have in the past have used their power to defraud. The main reason I become interested in coinswap lately is because of the current political situation of miners blocking Segwit and Lightning Network and all the fungibility they would bring. From my point of view miners can't be trusted, they'll do what's in their rational self-interest and that could easily involve stealing from coinswaps.

AdamISZ commented 7 years ago

Random re-orgs are not the issue here. We're worried about re-orgs that miners do intentionally.

Yes, I do understand, I was mentioning it for practical context; I wasn't aware of cases of deliberate re org to refer to as data points.

I think you're being very impractical in this point of view you're taking on it; if a miner is powerful enough and malicious enough to do this kind of attack, then not only is coinswap mostly broken anyway, but so is bitcoin (see my argument above re: altcoin exchanges, any service, not just coinswap). Debatable how much you're helped with larger confirmation values, if they can overclock 20-45 then they can overclock 30-55 and it won't matter how many confirms you wait.

If coinswap became so common that there tens of millions on the line I still think it's way out there to imagine this scenario, but at least it's possible. The "steal from lots of different parties at once" argument is valid, but then as I mentioned above, they could do tons of other simpler high value attacks.

But otoh there's no need to argue about it, since people can agree different values for "how many confirms to wait for TX0/1" and the timeouts, which are the main parameters for this.