Open bitjson opened 2 years ago
An easy first step here is to implement some basic analysis of CashFusion usage. Some ideas/discussion: https://github.com/Rucknium/CashFusionStats/issues/2
@bitjson Sounds good. As a long term goal I will try to incorporate detection of these transaction-level privacy defects into rbch
, which could eventually trickle down into CashFusionStats.
My sense is that clustering can be tricky and highly dependent on judgement calls. For example, FATF recently asked Chainalysis, CipherTrace, Coinfirm, Elliptic, Merkle Science, Scorechain, and TRM Labs to estimate the prevalence of illicit activity on the BTC blockchain. The top-line conclusion was:
As set out below, there are significant challenges in the development and interpretation of these market metrics. The blockchain analytic companies often reached starkly divergent results for the same questions, so caution must be exercised in drawing conclusions.
See paragraphs 76 - 102 of their report.
The ecosystem needs better (public, open source) visibility into privacy leaks to continue improving privacy for average users. And in the non-custodial world of cryptocurrency, privacy is protection from theft and physical violence, particularly for less wealthy users and those living under failing regimes.
Particularly when claiming funds after chain splits, transactions from multiple chains often reveal far more about a user's activity than they realize. Chaingraph is uniquely suited for clustering and privacy analysis because we can easily operate across multiple chains. Privacy analysis features need not take nodes acceptance into account at all, clustering should be performed on all transactions in the database, regardless of chain.
Blockchair's Privacy-o-meter documentation is probably the best summary of available clustering heuristics. (See also – this excellent thread about privacy leak via address types.) Chaingraph should implement and display some of these heuristics by default, and make it easy to enable the rest for block explorer-type applications.
In addition to those heuristics, we should try to support clustering by timing information. (E.g. merge avoidance isn't very useful if several chains of otherwise disconnected transactions are inactive for months but always move in the same hour.)
We should also add opt-in support for tracking and querying the actual address clusters. (#29 will probably be valuable for performance, I imagine we'll want to do most of the computation on the agent before saving transactions to the database.) In addition to being able to query the full list of clustered transactions, it would be fantastic if we supported materializing columns for:
Finally it would be nice to support aggregated statistics (depends on #32) for:
(Keyword for searchers: coinjoin, coinshuffle, cashshuffle, cashfusion, taint analysis)