JoinMarket-Org / joinmarket

CoinJoin implementation with incentive structure to convince people to take part
400 stars 119 forks source link

Full block SPV sync #653

Open chris-belcher opened 7 years ago

chris-belcher commented 7 years ago

Issue for https://github.com/JoinMarket-Org/joinmarket/issues/55#issuecomment-184165679

Yet another way is to create a client with SPV security (trusting the miners) which does not use bloom filters. In other words it downloads every block from the wallet creation date onwards and searches it for its own addresses. This would give SPV security but almost-full privacy (only leaking the approximate wallet creation date)

The Core developers are currently talking about adding a 'purespv' mode which is essentially this to the Core software. If so then we could maybe skip coding it for joinmarket and just piggyback off of Core. https://botbot.me/freenode/bitcoin-core-dev/2016-11-10/?msg=76281739&page=2 Although this would still require a separate application instead of being built into JoinMarket.

chris-belcher commented 7 years ago

This would be a great default blockchain interface. If your wallet creation date is now then no blocks have to be downloaded. Also it doesn't require any configuration by the user.

chris-belcher commented 7 years ago

This idea had the problem that takers wouldn't have access to the UTXO set, so makers could send them fake UTXOs to DOS.

Here is a possible solution to that from gmaxwell.

\<gmaxwell> belcher: exactly what is needed in these queries? can we make a public API that gives the required data without trashing privacy? \<belcher> not really, it needs a mapping from address to utxo and history \<belcher> im planning to write something where the user points joinmarket to their own full node and it sync's from there, which would be private \<gmaxwell> What does history mean precisely? \<belcher> whether an address was used on the blockchain before \<gmaxwell> and it needs this only as of the current height, right? \<belcher> yes \<gmaxwell> so, if I make a table of 128 bit hashes of txid|vout|scriptpubkey|amount|used_flag for the whole UTXO set the database would be 738 MB, maybe half that size if dust addresses were filtered. and maybe 10x smaller if you just instead eliminate all reuse (which I suspect you might be able to ban?).
\<gmaxwell> I think it would not be completely unreasonable to have a thing where users just download the whole thing, check its hash against varrious observers and get diffs-- especially if you could eliminate dust and reuse. Alternatively, it wouldn't be completely unrealistic for people to have PIR servers that can query this data. \<belcher> so that i understand, is that hash(txid|vout|scriptpubkey|amount|used_flag), and then if you know a UTXO you can check whether it exists? \<gmaxwell> Yes, so someone you're trying to join with would need to already know their UTXO, and you could query this to tell if it exists. If you can't do that then you need a real database and not a set-- which could be done but would be a couple times bigger-- not out of the question, but getting less attractive vs a full node. \<belcher> yes this scheme would help certainly \<gmaxwell> For the PIR approach the server must scan the entire database for every query (though some batching can be done) and do non-trivial computation. All in all computational PIR is more expensve than just transfering the data under most cost models... but perhaps a cost model that says user inconvience is roughly infinitely bad might be a more fair reflection of reality because users will reliably do some \<gmaxwell> thing stupid instead if you impose any cost on them. \<belcher> there is another idea for sync'ing wallets where people connect to any peers out there on the network and download full blocks starting from the wallet creation date, that gives almost full privacy but spv security, and issue with that for joinmarket is market makers send their UTXOs and takers can check whether those UTXOs are real or not, so this scheme with hashing everything would allow takers to check(!) \<belcher> also what does PIR mean? \<gmaxwell> belcher: there are some "full block SPV" mode patches to bitcoin core that do somethings like that. \<gmaxwell> belcher: private information retrevial. Basically there are a set of crypto techniques which let you query a remote database without the database learning your query. They're implemented and in the same kind of efficiency class as things like confidential transactions. \<belcher> that sounds interesting for sure, reading a bit now \<gmaxwell> There are two broad kinds of PIR, Information-theoretic PIR, which is much more efficient but depends on there being M of N servers which do not collaborate to violate your privacy. And there is computational PIR which can work with a single untrusted server and makes a cryptographic assumption and is cpu heavy (like doing an ellpitic curve operation for every record, for every query). \<gmaxwell> Percy++ is a library that implements both kinds, there are several other such libraries. \<gmaxwell> They've had very little use in practice because "just transfer the whole damn database" is often cheaper under reasonable assumptions of cpu vs bandwidth costs. (or at least close enough that the complexity of dealing with PIR isn't worth it) \<gmaxwell> But we know the whole world wide number of CJs can't be more than a couple per second. \<kanzure> also pester bramc for PIR things \<gmaxwell> Which basically means that like a couple servers at blockstream using their idle capacity could probably satisify the worldwide need for JM queries. \<gmaxwell> and since it's cryptographically private it probably doesn't matter if there is a bit of centeralization in the servers for it. \<belcher> yes that could work \<belcher> iv been wondering about ways to solve the maker-might-send-fake-utxos problem for a while now, and looks like this would be a solution :) \<gmaxwell> PIR natively gives you a private array access. So if what you need is a map you have a couple options: one is that you order the data and construct a small index for your map which you just send to ever user, (and since it's small thats no big deal) then they use private array access to get the entrie(s) they need. Another option is you build a n-ary search tree, and put each level of the tree in a \<gmaxwell> seperate PIR database, and users just query each level to find their records. \<gmaxwell> To prevent the server from silently giving bad results you merkelize the database, and in each leaf record include the membership proof, and the server publishes and signs the root... if the server starts putting out bad roots everyone will see. \<gmaxwell> (like the (not-yet)commited bloom filter proposals)