Blockstream / esplora

Explorer for Bitcoin and Liquid
MIT License
995 stars 397 forks source link

Tone down the privacy analysis #51

Closed RHavar closed 5 years ago

RHavar commented 5 years ago

While I like the idea of privacy analysis, but done badly it's pretty counter productive. Let's take this:

screen shot 2019-03-06 at 4 09 37 pm

The alarming "Round payment amount" red warning is dubious at best (we're talking about the difference of 1 significant figure). And the orange "Unnecessary input heuristic" tooltip incorrectly claims that an input is there that is not typically added by wallets (but in reality almost all wallets actually do).

Also not to mention, several of the heuristics are counter-productive. Obviously context-sensitive, but in general it's better for several reasons that wallets are oblivious to creating UIH or not. And likewise creating transactions without change is a good thing (for both fees and privacy).

But probably worst of all, the analysis is missing some of the most important privacy leaking checks.

I think the responsible thing to do would be move "Privacy Analysis" into it's own page that disclaims the limitations of it. Or at least only have prominent warnings for things that are definitely a problem (e.g. address reuse)

adam3us commented 5 years ago

ideally IMO improve the heuristic if these cases can be distinguished.

maybe also the linked page could explain the false positive cases or risks, so the user can have an indication if what they specifically were doing is good.

shesek commented 5 years ago

The alarming "Round payment amount" red warning is dubious at best (we're talking about the difference of 1 significant figure).

The "round payment amount" message is displayed when one output is rounder than the other by 3 digits or more. What I'm considering is the number of trailing zeros, so 0.22377000 vs 0.00865493 is a difference of 3, not 1.

This should happen by chance in 1/1000 of payments, or a false positive rate of 0.1%. Which seems reasonable to me.

And the orange "Unnecessary input heuristic" tooltip incorrectly claims that an input is there that is not typically added by wallets (but in reality almost all wallets actually do).

A more accurate way to phrase this is that the unnecessary inputs are not typically added by fee-minimizing wallet software, which attempts to find the smallest set of inputs sufficient for the amount being sent.

As far as I'm aware, nearly all consumer wallet software does that (edit: they might not minimize the number of inputs to the bare minimum, but will normally at least avoid adding unnecessary ones). Transactions that don't minimize inputs are usually an indication that some kind of non-consumer-wallet software created the transactions.

This is also known as UIH-2, which is described here.

Obviously context-sensitive, but in general it's better for several reasons that wallets are oblivious to creating UIH or not.

It might be better if all if wallets implemented behavior that didn't attempt to minimize the number of inputs to make UIH less effective (though, this would also mean that you're linking more of your addresses together - which might have a worse overall effect on privacy). But as far as I know, this is just not the case today, which makes UIH an effective analysis technique.

And likewise creating transactions without change is a good thing (for both fees and privacy).

I agree, if your wallet is able to find a combination of inputs that let you pay without change, this is indeed the best for both fees and privacy. And another possible reason for no-change transactions is donations made by advanced users using manual coin-selection to explicitly avoid change.

However, I think its far more common for transactions to have no change because they're self-transfers (into an exchange, another wallet, lightning channel, etc) than for any other reason. This usually happens in one of two ways: users using the "send max" feature of their wallet, or users doing manual coin selection when moving funds between their own wallets (the first probably being far more common, as the latter is only done by more advanced tech-savvy users).

Going over my transaction history in two of my wallets (cold & hot), I can't seem to find a single transaction that had no change because the coin-selection algorithm managed to find an exact match for a real payment, and plenty of transactions where I was sending coins between my own wallets.

Is your experience different? Perhaps I'm not aware of other wallets that are smarter at doing this? Though it seems that even with a very smart and highly optimized coin selection algorithm, finding exact inputs still require some non-negligible amount of luck and would still be relatively rare.

But probably worst of all, the analysis is missing some of the most important privacy leaking checks.

I would love to add them! Which ones would you like to see?

shesek commented 5 years ago

maybe also the linked page could explain the false positive cases or risks, so the user can have an indication if what they specifically were doing is good.

Added some explanation for other reasons to have no change at https://en.bitcoin.it/w/index.php?title=Privacy&diff=66261&oldid=66255.

The "sending to a different script type" section (added by myself to have something for esplora to link to) already does attempt to explain that this is a necessary evil for adopters of new wallet technology, and that there's nothing much to do other than waiting for wider adoption.

Also, generally speaking, I tried to divide the messages into red and orange based on whether its easy to something about it, either by changing user habits or by adapting the wallet software. Orange is the stuff that we kinda have to live with (different script types, UIH1 & UIH2, no change), red are the ones that are relatively straightforward to fix (currently only address reuse and round amounts).

RHavar commented 5 years ago

As far as I'm aware, nearly all consumer wallet software does that. Transactions that don't minimize inputs are usually an indication that some kind of non-consumer-wallet software created the transactions.

What wallets are you aware of that do this? o.0 I know that bitcoin core, and all the wallets I use certainly don't.

RHavar commented 5 years ago

Copying my reply from reddit:

Though it seems that even with a very smart and highly optimized coin selection algorithm, finding exact inputs still require some non-negligible amount of luck and would still be relatively rare.

Yeah, I've achieved rates of >80% no-change transactions with coinsayer.com on a commercial basis. It's worth noting that there's two things that contribute to this:

a) The wallet typically will have over a couple thousand utxo's to spend from. Which in end-user settings is not very ideal

b) In addition to instant payments, when ever making payments, the wallet has a queue of pending payments that can be made at the same time (partial batching). Although from a block-explorer point of view, it'll look like the transaction has change (but in reality is actually another payment)

shesek commented 5 years ago

What wallets are you aware of that do this?

If I understand correctly, it looks like Electrum will stop adding new "buckets" (groups of utxos belonging to the same address. without address reuse, a bucket is a single utxo) the moment it reaches the target value, which should mean that it won't produce transactions matching UIH-2. See sufficient_funds and its usage in coinchooser.py.

I tried analyzing a few dozens of non-manual-coin-selection transactions from my Electrum wallet, none of them appear to match UIH-2.

I'm less sure about Bitcoin Core's behavior (and don't have handy transactions produced by it to check), but it looks like at least some consideration is given to avoid adding unnecessary inputs once you reach the target amount + MIN_CHANGE. The fact that its targeting a minimum size for the change output could potentially trigger UIH-2, but I think the false-positive rate would still be relatively low. To make it even lower, we could check that the unnecessary inputs are larger than some minimum threshold.

Yeah, I've achieved rates of >80% no-change transactions with coinsayer.com on a commercial basis.

Was not aware of coinsayer, looks very nice! A rate of >80% no-change transactions is impressive.

However, I wonder, wouldn't a commercial transaction producer that deals with thousands of utxos and high volumes of payments usually have at least some payments to batch together? It seems to me that usage in a commercial setting like you're describing would almost always have at least 2 payments per 10 minutes, which they should be able to batch together into a multi-output transaction, in which case the no-change heuristic is no longer relevant.

Also, would you say that achieving high no-change rates is a typical behavior by transaction producers outside of coinsayer?

Although from a block-explorer point of view, it'll look like the transaction has change (but in reality is actually another payment)

We currently only consider transactions of exactly two outputs for the heuristics relating to change detection, which would rule out most batched transactions (except for ones that batch exactly two payouts and are able to do this without change -- should be quite rate).

shesek commented 5 years ago

However, I wonder, wouldn't a commercial transaction producer that deals with thousands of utxos and high volumes of payments usually have at least some payments to batch together?

A followup question: does the 80% figure for no-change transactions also include transactions with multiple payout recipients (so >=2 outputs, but no change), or just the transactions making a single payout (exactly 1 output)? If it is for both combined, could you provide the breakdown between the two?

RHavar commented 5 years ago

However, I wonder, wouldn't a commercial transaction producer that deals with thousands of utxos and high volumes of payments usually have at least some payments to batch together?

Well it depends on the commercial service, and the expectations of your users. The business I used to own (bustabit) just gave users two options: a) Have your money immediately and b) Have your money in the next ~24 hours.

People in category a) paid a premium, and expected their money right away (as in, immediately get linked to a txid). This is important, as they frequently rely on it to be able to pay a time sensitive invoice and shit like that. People on category b) would get put in a wait list, and attached to transactions when ever convenient.

There's obviously a lot more advanced stuff you can do (e.g. RBF an existing unconfirmed sends) so you rarely need to send more than 1 transaction per block, but the operational complexity goes through the roof.

Was not aware of coinsayer, looks very nice! A rate of >80% no-change transactions is impressive.

Thanks, although all the credit really goes to the solvers. coinsayer is actually pretty dumb, it really just compiles the problem.json into a integer linear program, and then fires a bunch of leading-solvers at it (then collects the results and presents it as json). i.e. it's just some polished glue (although I'm actually not taking any more new customers at the moment, as I have too much on my plate).

A followup question: does the 80% figure for no-change transactions also include transactions with multiple payout recipients (so >=2 outputs, but no change),

Just that 80% of the transactions had no change output at all. I don't have a further breakdown, but many of most would have 1 output or 2 outputs (most users are impatient, and were happy to pay a premium for instant transactions).

I can't actually provide you with any further breakdown, as that stat is a bit old (it's from before I sold my casino ~a year ago). It's also worth noting that the "no change" transactions will drastically increase during high-fee periods (basically because it would calculate the total cost of creating and later consolidating change and willing to "sacrifice" that to miners). So for optimized coin-selection, you don't actually need to exact inputs out (e.g. it's worth overpaying 5 cents of fees, if the total-cost of creating change would've been 10 cents...). So as the cost of creating change goes up (i.e. fee rates, and off-peak rates go up) the amount of no-change transaction possibilities is going to go up hugely.


But i think my conclusion would be that it's fine to point out the transaction is "likely a self-send". But it's a bit absurd to say it's a problem, without knowing more.

shesek commented 5 years ago

@RHavar The texts have been toned down, and I changed some "warning" messages (in orange) to be a "info" message instead (in blue). See https://github.com/Blockstream/esplora/commit/f926ee2f55cbce3a1717fec79e4218c10530a44b.