AdamISZ / CoinSwapCS

Simple implementation of Bitcoin CoinSwap, client-server
GNU General Public License v3.0
31 stars 14 forks source link

Multi-transaction coinswaps #47

Open chris-belcher opened 6 years ago

chris-belcher commented 6 years ago

The original coinswap has issues with amount correlation, because Alice and Carol swap the same amount of bitcoin. In this project that has been solved using the added amount Δ.

Another way of doing it is to have several transactions instead of one on each side of the coinswap. I have this written on some paper I wrote a few months ago and now I'm copying it onto github.

Right now this is the situation for coinswapping 10btc. This is just a recap of https://github.com/AdamISZ/CoinSwapCS/blob/master/docs/coinswap_new.pdf

TX0 (Alice funding) Alice ---> Alice + Carol multisig [10btc]

TX2 (Alice refund) TX0 ---> Carol + H(X) OR Alice + CLTV L0 [10btc]

TX4 (Alice cooperate) TX0 ---> Carol [10btc]

And on Carol's side we have the same thing but mirrored, except Carol's timeout L1 is closer to the present than Alice's L0. (L1 < L0) This is because Alice is the one who generated the secret X.

TX1 (Carol funding) Carol ---> Alice + Carol multisig [10btc]

TX3 (Carol refund) TX1 ---> Alice + H(X) OR Carol + CLTV L1 [10btc]

TX5 (Carol cooperate) TX1 ---> Alice [10btc]


For the multi-transaction scheme, there are still TX0, TX1, TX2 and so on but they are made of more than one transaction, which I'll call TX0_0, TX0_1, TX1_0, TX1_1 and so on. The hashlock secret H(X) and timelock times L1, L0 are the same for everything. This example uses two transactions but it could work for three or four transactions.

The privacy for amount correlation comes from the subset-sum problem. If the adversary sees Alice mixing 10btc they need to scan the entire blockchain in the relevant period looking for transaction outputs which together add up to 10btc. The subset sum problem is NP-complete and increasing the number of transactions N generally increases the difficulty of the problem exponentially.

Here is a scheme for mixing 10btc, Alice creates a coinswap with 9btc + 1btc, Carol creates a coinswap with 8btc + 2btc

(Alice funding) TX0_0 Alice ---> Alice + Carol multisig [9btc] TX0_1 Alice ---> Alice + Carol multisig [1btc]

(Alice refund) TX2_0 TX0_0 ---> Carol + H(X) OR Alice + CLTV L0 [9btc] TX2_1 TX0_1 ---> Carol + H(X) OR Alice + CLTV L0 [1btc]

(Alice cooperate) TX4_0 TX0_0 ---> Carol [9btc] TX4_1 TX0_1 ---> Carol [1btc]

Carol's transactions are:

(Carol funding) TX1_0 Carol ---> Alice + Carol multisig [8btc] TX1_1 Carol ---> Alice + Carol multisig [2btc]

(Carol refund) TX3_0 TX1_0 ---> Alice + H(X) OR Carol + CLTV L1 [8btc] TX3_1 TX1_1 ---> Alice + H(X) OR Carol + CLTV L1 [2btc]

(Carol cooperate) TX5_0 TX1_0 ---> Alice [8btc] TX5_1 TX1_1 ---> Alice [2btc]

The order of events is similar to single-transaction coinswap.

  1. Both sides swap public keys and parameters (H(X), L0, L1) with each other
  2. Both sides create the funding transactions TX0_0, TX0_1, TX1_0, TX1_1 (but dont reveal yet)
  3. They create refund transactions TX2_0, TX2_1, TX3_0, TX3_1, they reveal them to each other and get the other's signature.
  4. They broadcast the funding transactions and wait for confirmations.
    1. If the other side doesn't broadcast, the first side can use the refund transactions to get their money back, either after a delay or using the X secret if they know it.
  5. They create the cooperative transactions TX4_0, TX4_1, TX5_0, TX5_1 and send the unsigned transaction to the other party which sends back its signature.
    1. Only sending the unsigned transaction means the other side can't broadcast just one transaction, which avoids an attack. Only one party knows the final fully-signed transactions TX4 and TX5 and they broadcast them together (or with some delay for privacy)
    2. If one side broadcasts just one refund transaction (e.g. TX4_0) then that counts as non-cooperation and the other side broadcasts the other (TX4_1) and the whole setup goes into the non-cooperative state where people either reveal the secret X or wait for the timeouts L0/L1.

This also works with different numbers of transactions on each side. So for example Alice could have just bought bitcoins from a very un-private exchange and wants to mix them. She can create just one TX0 but receive 5 or 6 TX5s from Carol. Probably in practice Carol would choose the number of TX5s to fit with how many UTXOs she has so that she don't have to create too many change addresses.

AdamISZ commented 6 years ago

Thanks for writing this out.

I'm definitely going to take the time to look into it once I switch attention back to coinswap from JM, which I hope to do in the next few weeks.

AdamISZ commented 6 years ago

Wanted to record a few thoughts as aide-memoire from a long conversation session yesterday.

Basically it feels very close to being valid to say "if coinswap is secure, then this multi-tx scheme is also secure", but annoyingly not close enough to be able to dismiss any concerns as academic.

What hurts the analysis a bit is that we don't have a formal proof along the lines of "assuming Bitcoin's blockchain has properties X, Y, Z then this Coinswap construction delivers its promise that "Alice receives N coins iff Carol receives N coins" " or similar. In the absence of that, we don't have a formal framework to try to fit this into.

I share @chris-belcher 's observation (after being confused for quite a while..) that a crucial element is: if one side goes into backout mode on Coinswap 0 (assuming the two parallel non-equal amounts are called Coinswap 0 and Coinswap 1), they must go into the same backout mode on Coinswap 1. Expansion:

Remember that, in the basic Coinswap design (as in this repo), there are two different backout paths that can be executed by each of Alice and Carol; they are called TX2 and TX3. In this new design, TX2 and TX3 within one coinswap have different amounts. So if Alice has to backout via TX2_0 for Coinswap 0, she must also backout via TX2_1 (and not e.g. TX3_1), to ensure her total receipt is the expected amount. In the above example, that'd be 9 coins from TX2_0 and 1 coin from TX2_1 for the expected total of 10.

I'd like to see point 5 in the above example rewritten and expanded on, since there are a couple of errors in it, and also I haven't really understood it, but for sure it does look like that's the detail that needs paying attention to.

ZmnSCPxj commented 6 years ago

Good morning,

Would it not be possible to just split one set of transactions, specifically, the one with the later lock time (L0)?

TX1, TX3, and TX5 TX0, TX2, and TX4 are split into _0 and _1.

This way, if either TX2 TX3 is spent by Carol Alice, or TX4TX5 is published on-chain in exchange for the secret before the L1 timeout, then both of the TX3_0 and TX3_1 TX2_0 and TX2_1 are now spendable by Alice Carol before the L0 timeout.

This still provides some value unlinkability, while ensuring atomicity.

One might compare it to how, on Lightning, a single publication by the ultimate receiver enables an entire route of transfers.

In this case, by publishing the single TX2 TX3 (or sharing the secret X) Carol Alice enables both of the transfers to Alice Carol.


This is confusing to me: Below is no longer confusing to me. I was incorrectly interpreting "Carol's timeout" to mean "the timeout for Carol to publish the secret X", not "the timeout for Carol to recover her funds".

And on Carol's side we have the same thing but mirrored, except Carol's timeout L1 is closer to the present than Alice's L0. (L1 < L0) This is because Alice is the one who generated the secret X.

Should it not be that Carol is the one who generates the secret X? Carol has the tighter timeout, and thus must publish the secret to Alice before L1. Then Alice, on receiving the secret, has the time from L1 -> L0 to redeem her money.

Regards, ZmnSCPxj

AdamISZ commented 6 years ago

@ZmnSCPxj thanks for the comments, before any more substantial response (at first glance it certainly looks like a good idea to me), I wanted to mention, you wrote:

In this case, by publishing the single TX2 (or sharing the secret X) Carol enables both of the transfers to Alice.

It's Alice who starts with the secret and transfers it to Carol in this setup. Just in case this is (still?) causing any confusion.

ZmnSCPxj commented 6 years ago

Yes, I amended my previous comment. I was confused further by L1 < L0, as to me, the lower index should be the earlier lock time. Below is a cleaned-up comment without strikeouts:

Would it not be possible to just split one set of transactions, specifically, the one with the later lock time (L0)?

TX0, TX2, and TX4 are split into _0 and _1.

This way, if either TX3 is spent by Alice, or TX5 is published on-chain in exchange for the secret before the L1 timeout, then both of the TX2_0 and TX2_1 are now spendable by Carol before the L0 timeout.

This still provides some value unlinkability, while ensuring atomicity.

One might compare it to how, on Lightning, a single publication by the ultimate receiver enables an entire route of transfers.

In this case, by publishing the single TX3 (or sharing the secret X) Alice enables both of the transfers to Carol.

ZmnSCPxj commented 6 years ago

Note that publication order of TX0_0, TX0_1, and TX1 needs to be defined here. Carol should only sign and publish TX1 if and only if both TX0_0 and TX0_1 exist on the chain with sufficient confirmations. Although I suppose the current protocol already ensures this.

chris-belcher commented 6 years ago

I've been drawing some flow charts for how multi-tx coinswap should work. The flow charts aren't very helpful for showing things, but they are great for convincing yourself that it works.

Both sides know all the refund transactions (TX2_0, TX2_1, TX3_0 and TX3_1). If one side broadcasts one refund transaction, they can never stop the other side broadcasting the others. Everyone always gets their money back if just one refund TX is broadcasted. So then it's always true that if one coinswap goes into backout mode, the other coinswap will also go into backout mode.

Once X is revealed to both sides, they could both broadcast refund transactions and immediately get all the money using X without waiting for any CLTVs. That's true for multi-tx coinswaps in the same way as single-tx coinswaps, that's why both parties agree to create the co-operative transactions in the first place.

Everything else is the same a single-tx coinswaps. For example the other side can troll by also broadcasting the refund transactions when the co-operative transactions are broadcast. But regardless of transaction gets mined, the rightful owner always get the money because they know X. The same is true for single-tx coinswaps.

ZmnSCPxj commented 6 years ago

Thinking about it, @chris-belcher is correct. There is an edge case where a race condition has a cooperative tx of one swap and the backout tx of the other swap gets into the miner mempool at the shorter timeout (for instance if the secret-maker gets shut down and then comes back up near the shorter timeout), but this can be mitigated by the secret-maker not broadcasting the cooperative tx if the shorter timeout is too near.

Indeed, there are long-term plans in Lightning to enable splitting payments across multiple routes which basically involve multiple coinswaps in parallel similar to multi-transaction coinswaps.

chris-belcher commented 6 years ago

Can you explain that a bit more @ZmnSCPxj ?

Are you saying that if (for example) TX4_1 gets broadcasted, and soon after TX2_0 the backout transaction gets broadcasted. If so, then after TX2_0 confirms Carol can use her knowledge of X to take the money immediately. The same thing happens if TX4_0 got broadcasted but with worse privacy.

ZmnSCPxj commented 6 years ago

I confuse the tx names easily, but in any case I imagine that all pairs publish the HTLC smart contracts on all funding transactions, then near the shorter timeout, the hash is published onchain --- or rather on-mempool. There is race where the other side can see the hash-publish of one pair on-mempool, then itself publishes the timeout transaction on-mempool of the other pair. This is an extreme edge case and can be mitigated by simply not publishing onchain if the timeout is too near already. Indeed it is likely that actual implementations will simply let timeouts lapse if the HTLC smart contracts get published.

chris-belcher commented 6 years ago

Pasting a conversation here. It looks like multi-transaction coinswaps don't solve the amount correlation problem.

\<belcher> iv been reading a little about the subset sum problem and realized it doesnt actually describe the what happens in multi-transaction coinswap \<waxwing> oh. specifically only multi-tx coinswap though? \<waxwing> yes with subset sum you can for sure get false positives, but i see that as sort of asymptotically approaching the case of just equal sized outputs. but anyway, sounds interesting. \<belcher> any privacy tech that uses multiple outputs to stop amount correlation (coinswap, joinmarket's tumbler) \<belcher> the realization that the subset sum problem doesnt describe whats happening just means we cant apply the same theorems from a textbook \<belcher> it might turn out that the blinding factor method that's in CoinswapCS today of stopping amount correlation is actually better \<waxwing> you mean, because it's an attempt to match subsets across two sets, rather than just subset == x? \<gmaxwell> Why is it not exactly subset sum (assuming you hold that users aren't paying each other in it) \<belcher> the way its normally formulated, subset sum is about finding a subset that adds up to a target value, in multitx coinswap there'd only be two or three coinswap outputs... so the problem is finding two or three integers in a big set that add up to the target \<belcher> so its similar to the subset sum problem but you only need to consider subsets which have two or three elements \<waxwing> oh that, i see \<gmaxwell> "Sparse subset" \<gmaxwell> yes, thats a much simpler problem, computationally. \<belcher> yes, the exponential behaviour seems to come from having to consider all possible subsets \<gmaxwell> right, you can think of n choose m as being near polynomial when m is near 0 or near n, and exponential when it's near n/2. \<belcher> in any privacy tech like coinswap, n would be the number of outputs in a block * how many blocks of delay, m would be two or three... so amount correlation could be done near-polynomial \<waxwing> yeah i always assumed that it's basically solvable unless you can go below level of noise, but it does depend on assuming a specific (simple) pattern. \<waxwing> good to think it through in detail though, cool

chris-belcher commented 4 years ago

Actually multi-tx coinswaps can be made to genuinely improve amount privacy. One way is by creating a situation where an adversary would find a huge amount of false positives which are very close the amount being sent. So even if the adversary can iterate all the amounts computationally it won't help them much due to the huge number of false positives.

Another way to make multi-tx coinswaps work is to wait a long enough time so that enough regular bitcoin transactions are built up to provide enough cover traffic which increases the computational complexity. For example, if each block has 2500 outputs, there are 144 blocks per day, 5 output coinswap transactions are created and one day is waited between creating and closing the coinswap then the number of operations required is above 2^80. Because comb(2500*144, 5) =~ 2^85. If you want to use 4 coinswap transactions instead of 5 then you must wait 7 days instead as comb(2500*144*7, 4) =~ 2^80.

I did further research on this back in 2019 but didn't get round to writing it on github until now.