JoinMarket-Org / joinmarket

CoinJoin implementation with incentive structure to convince people to take part
398 stars 119 forks source link

tumbler.py and general privacy #28

Open chris-belcher opened 9 years ago

chris-belcher commented 9 years ago

tl;dr If you were designing a tumbler using joinmarket primatives that's hard to unmix, how would you do it?

One popular application of this project will probably be completely breaking the link between coins. The advantages over centralized tumblers are cost and no counterparty risk.

Examples of users might be people who simply bought bitcoins with a very privacy-invading method, such as passing AML/KYC from an exchange, and wish to have privacy in all their purchases again. Some bitcoin users also just need it as a simple medium of exchange, buying bitcoins with traceable fiat and immediately spending them on goods and services. Example would be a anonymous buyer of a domain name, VPS hosting, email, VPN provisions. Users also might be those who engage in capital flight or want to store bitcoins without anyone knowing. They are the kind who would use tumbler.py

Another kind of people use bitcoin like a bank account and do most of their daily spending and earning with it. They would best be served by an electrum plugin that coinjoins every transaction they do, instead of tumbler.py

We need a discussion on what exactly a tumbler bot should do. Repeatedly doing coinjoins will be easy to unmix by looking for a similar sized output as in https://www.reddit.com/r/DarkNetMarkets/comments/2rhaqc/deanonimyzing_bitcoinfog_and_other_tumblers/

So clearly the coins being mixed need to be split up into many different sizes. They should reach the user's clean wallet in several different addresses, although the user should take care not to recombine the outputs. Ideally the service provider (exchange, payment processor, marketplace, etc) should offer up two or three bitcoin addresses to a user, who can feed those addresses into the tumbler bot. We might need a drive of awareness-raising to convince admins to provide multiple addresses to deposit.

It's not clear to me if repeated coinjoin sweeps is worth it, since an observer on the blockchain can easily see an amount that had been coinjoin swept several times. It might be better to do a single coinjoin but with many other participants.

Also we should think carefully about the time interval between different coinjoins. If the blockchain observer sees a ton of coinjoins one very soon after another, it will be more likely they belong to the same person. Random time intervals are needed, trouble is the usd/btc volatility means you cant take too long to tumble.

Perhaps we need to think about how to describe configuration to users. Perhaps a size of anonymity set meaning if you set that to 10, there are 10 other 'people' or 'wallets' who might also be confused with you.

I mean in terms of the transaction sizes, output amounts, timings, number of participants and so on.

chris-belcher commented 9 years ago

The tumbler algorithm could try to make its transactions look like ones by yield-generator. The only difference is that tumbler pays and yield-generator gets paid. If we coded a patient-tumbler by analogy with patientsendpayment then even this difference would be removed.

chris-belcher commented 9 years ago

Copypaste from gmaxwell's original coinjoin OP that might be relevant.

FAQ: "Isn't the anonymity set size limited by how many parties you can get in a single transaction?"

"Not quite. The anonymity set size of a single transaction is limited by the number of parties in it, obviously. And transaction size limits as well as failure (retry) risk mean that really huge joint transactions would not be wise. But because these transactions are cheap, there is no limit to the number of transactions you can cascade.

In particular, if you have can build transactions with m participants per transaction you can create a sequence of m*3 transactions which form a three-stage switching network that permits any of m^2 final outputs to have come from any of m^2 original inputs (e.g. using three stages of 32 transactions with 32 inputs each 1024 users can be joined with a total of 96 transactions). This allows the anonymity set to be any size, limited only by participation." https://en.wikipedia.org/wiki/Clos_network

I will try to read it again a few more times to entirely understand it and imagine how it would work in this liquidity maker/taker system

chris-belcher commented 9 years ago

If I'm not wrong, gmaxwell's idea for a switching network cannot be used here because there is no way to force your counterparty to do another coinjoin for the same amount.

chris-belcher commented 9 years ago

How does this sound for the tumbler. The user deposits coins in mixing depth=0, tumbler.py does coinjoins of random amounts, occasionally doing sweeps (i.e. coinjoin amount = maximum possible) so the coins work their way up to a higher mixing depth, maybe 5 or so by default.

To defeat the transaction-size-searching used by /u/impost_r in the reddit thread, some of the UTXOs will be combined into a coinjoin transaction going to the destination address. The tumbler will then stop and ask for the user for a new destination address. The user will go to his service, like bitstamp and click Generate new deposit address and give that to the tumbler, where more coins will be sent. And so on until all the coins are used up. Or ideally, the service gives the user several addresses which the user gives to tumbler.py which sends to them without the user constantly needing to come back. BitcoinFog's way of sending multiple outputs to the same address is subpar, it is what lead to /u/impost_r learning that given addresses was a BitcoinFog address. So we need a new address each time.

In this way, the tumbler.py transactions pretends to be multiple sendpayment.py calls.

A patient-tumbler.py would be the same except it acts as a maker for a little while, before giving up.

The random statistical distribution of time between coinjoins should be the poisson distribution since that describes uncorrelated events, After all, tumbler.py pretends each of these coinjoins is done by a different entity so why should they be correlated. Indeed maybe we should change the lambda parameter for each mixing depth, then it's like different people spend their money at different average rates.

The coinjoin amounts should follow Benford's Law along with some kind of power law? Also sometimes we might want to truncate the number of decimal places, as humans typing numbers into boxes are apt to do that and we want to blend in with them.

The timings could be configured by the user with a parameter "average time to mix" which is probably the easiest way to explain it.

All the timings and coinjoin amounts could be generated at the start then outputted to show the user, who would then know exactly when their tumbling will be finished and when they need to come back to their computer to generate a new deposit address. The progress of the tumble should be saved to a file, so if there is a power cut the user can restart the tumble from where they stopped. The time the user will choose could be related to the average number of transactions on the entire bitcoin network.

An interesting question is how does privacy increase with the number of destination addresses used. To deanonymize them, the blockchain observer like /u/impost_r needs to find combinations of addresses where the outputs going to those addresses match the input value. The number of combinations is found by nCk (n choose k), where k the number of addresses tumbler.py uses and n is number of possible address (i.e. all the coinjoined addresses) Look on wikipedia for that formula and for a large n it goes up very quickly as k increases. Even just going from 2 destination addresses to 3 addresses gives a huge improvement in privacy.

chris-belcher commented 9 years ago

I've been checking some popular bitcoin services for their handling for deposit addresses. Services allowing easy access to deposit addresses are great for privacy.

Localbitcoins gives you a new address when the current one receives a transaction. You can also reuse old address and localbitcoins will still honour them.

Bitstamp by contrast only allows you to request a new address every 24 hours. Which is very annoying for privacy. It would be possible for the user of tumbler.py to wait 24 or 36 hours to make a deposit but it's still very inconvenient.

Bitfinex gives you not one but three deposit addresses. One each for trading, deposit and exchange accounts. And in the user control panel its very easy to move funds between the internal accounts. Also it seems you can instantly and repeatedly request a new address. Bitfinex seems the best for privacy so far.

Of course another possibility is to use tumbler.py to send to multiple addresses in another wallet your own, then as long as you don't recombine all the outputs when you send them off to bitstamp you should get more privacy.

Yet another way to help is to use politics to convince the admins of these services to display several deposit addresses. This may work for some, especially given the ideology of many bitcoin users. It probably won't work for someone like bitstamp because they take anti-money laundering very seriously and probably won't want to be seen as aiding financial privacy. Nonetheless, politics is beyond the scope of this project. Part of the point of cypherpunks is to write code that works around politics.

sundance30203 commented 9 years ago

Chris, I like where you are going with this question. Have you seen this https://bitcointalk.org/index.php?topic=752260.0? Bitcoin mixing on unequal inputs.

chris-belcher commented 9 years ago

Thanks for the message sundance. Unless I'm misunderstanding the problem that BCM solves does not exist in joinmarket. In joinmarket the makers announce the range if sizes they are willing to coinjoin, and the takers choose a coinjoin amount when they start the transaction.

chris-belcher commented 9 years ago

Relevant reading maybe? https://medium.com/@lopp/the-challenges-of-optimizing-unspent-output-selection-a3e5d05d13ef

sundance30203 commented 9 years ago

Great post and an interesting problem, that optimal UTXO selection. UXTO selection has huge implications to privacy. I have a combinatorial solution that solves this and I am working on integrating into a wallet. This computational problem is APX-Complete (ie, it can't even be approximated unless P=NP), which is more difficult than NP-Complete class of problems. Thus, there are bound to be dozens of interesting heuristic solutions aimed at different use cases, privacy being one of them.

adlai commented 9 years ago

random idea from IRC:

once we have the ability for participants to specify multiple outputs, the tumbler can create multiple joins at later levels to better induce forensic teeth-gnashing.

chris-belcher commented 9 years ago

Random time intervals are needed, trouble is the usd/btc volatility means you cant take too long to tumble.

Someone could hedge their usd/btc exposure. Either by opening short positions on exchanges or better yet using specialized derivatives like on bitmex. This way they could tumble for months without fear of the price dropping.

chris-belcher commented 8 years ago

Right now the tumbler generates random coinjoin amounts. It is slightly implausible that a human would write out all the digits of the amount down to one satoshi. So a useful addition to improve privacy might be to round off the coinjoin amounts.

tailsjoin commented 8 years ago

A couple of additional ideas to increase tumbler privacy:

  1. Change nick after each tx.
  2. Randomize miner fee between txs. Maybe a spread can be set by the user with default being 10000-12000 or something.

Ideas to increase usability by avoiding liquidity/flooding issues:

  1. --maxcjamount
  2. --maxmakercount
chris-belcher commented 7 years ago

One method that coin tracking startups use which hasn't been written about yet is looking for round numbers.

If they see an output for 1btc or another round number they can reasonably assume that this is the "real" payment and other ones are randomly-generated intermediates. Also it helps them distinguish between change and output addresses (which is easy in our coinjoin anyway). Other round numbers can be found by converting the bitcoin amount to USD or other currencies.

So one thing the tumbler could do is make some of the intermediate coinjoins use round-numbered outputs.

chris-belcher commented 7 years ago

Another way is that Core right now aims to get enough UTXOs for desired-amount + 0.01btc, so the change address is close to 0.01btc. Apparently new releases of Core should fix this.

See these transactions for examples

265431bd05a13ca62ffa38f4b1739ef2b3389adaa1daf1b8cb6bfa995a83d198 debf2065adae6a5f376842b50dd6125d5e332e7d5d25ec0919bc40dc12f4b1d3 8a2eb7a8294ba50e867a7547a0a0dbb5a994a71841fa8a0de30115c6b70b02d6 8f957f0cb5728c73fc86e0518f95e6038a053563ad6a9ef631cf72417f681e32 46cdbbdec3c5928259f13bfd49d50a34d34263ab7cc6b73929b2aeddfabda215 2dc95059189be0b163358c50ad731d474f984cf4e8ca61481ba181f7d21cb9bb 24218291d32eaf6a42d461f0cb1f461263d495b196ba18608b1bd582addfedab 54e682b41a0ba0a6cc76785d5219e5b99f33bbb618e7b60ab3ae1d548dbb629a

Now obviously for joinmarket the change output is obvious, but I'm writing this down because it should be useful for coinswap.