Neptune-Crypto / neptune-core

anonymous peer-to-peer cash
Apache License 2.0
25 stars 7 forks source link

Mutator Sets: Merkleize the entire SWBF #71

Closed aszepieniec closed 3 months ago

aszepieniec commented 10 months ago

The current Mutator Set design conflates two things that are conceptually distinct. Both of them are referred to by the name "active window".

  1. The sliding window, which is the interval in the infinite bit array in which a given addition record defines $k$ locations. The sliding window is different for each batch of $b$ addition records.
  2. The plain, which is the tailing portion of the sliding window Bloom filter (SWBF) that is represented explicitly and not compressed with a Merkle mountain range.

The reason why the the current sliding window was represented explicitly by the MutatorSetAccumulator is the implicit assumption that most transactions spend relatively new UTXOs. As a consequence, a lot of work related to the Merkle mountain range can be saved by not Merkleizing the tail where most of the activity is expected to happen. This is an assumption though; future data might or might not bear it out. And even if it does, it is by no means clear that the optimal portion of the SWBF to be represented explicitly matches exactly with the current sliding window.

This issue entails rewriting the Mutator Set such that:

The realization that the tailing part of the SWBF might as well be Merkleized opens up intriguing lines of thought:

dan-da commented 10 months ago

From the point of view of the passive verifier, it reduces the cost. This allows for a larger active window, which (is necessary to support a larger batch size, which) increases anonymity.

Can you quantify how much it increases anonymity vs status quo?

aszepieniec commented 10 months ago

Every removal record (transaction input) gives the outside observer a fuzzy timestamp on the matching addition record (transaction output). This fuzzy timestamp identifies a range of plausible addition records. One way to quantify the anonymity is to look at the information entropy of the uniform distribution over this range. By this metric, an anonymity of 4 bits corresponds to a ring size of 16.

The current parameter set ($w = 2^{20}, s = 2^{12}, b = 2^3$, for window size, step size, and batch size, respectively) generates this graph, which captures the frequency with which a given entropy is observed. (Note that every removal record generates a probability distribution, and that two such probability distributions may have distinct entropies.)

image

It says that the minimum entropy is 3 (corresponding to batch size 8). However, on (geometric) average the entropy is slightly more than 5, corresponding to ring size 32.

If we multiply $w$, $s$, and $b$ by $2^t$, then the shape of the graph is identical and the only difference is that the horizontal axis has shifted to the left by $t$ steps.

If we multiply $w$ and $s$ by $2^t$ but not $b$ then the shape is similar but

aszepieniec commented 8 months ago

I'm starting on this task momentarily. This post tracks todos and status.

aszepieniec commented 8 months ago

I started implementing this on branch aszepieniec/71_merkleize_entire_swbf but am now reconsidering this refactor. The branch compiles, but the tests don't work. Unless there is a compelling argument to continue, I will be abandoning this branch.

The reason for second-guessing this refactor is that the interface of the Mutator Set needs to be amended. Previously, add took a mutator set accumulator and an addition record; now it takes a mutator set accumulator, a dictionary of chunks of the SWBF along with their membership proofs, and an addition record. But this dictionary is part of the information needed to apply the addition record, so at least conceptually it is part of the mutator set accumulator.

Phrased differently: we might as well replace the entire mutator set accumulator with its hash. In order to add or remove an item we would then need to supply its preimage, in addition to the addition record or removal record. Having to add include this preimage defeats the purpose of reducing it to a single hash, because you always need it.

As for the original motivation that led to this effort --

The reason why the the current sliding window was represented explicitly by the MutatorSetAccumulator is the implicit assumption that most transactions spend relatively new UTXOs. As a consequence, a lot of work related to the Merkle mountain range can be saved by not Merkleizing the tail where most of the activity is expected to happen. This is an assumption though; future data might or might not bear it out. And even if it does, it is by no means clear that the optimal portion of the SWBF to be represented explicitly matches exactly with the current sliding window.

-- I have to disagree: the reason why the current sliding window is part of the MutatorSetAccumulator is because you need this data to produce membership proofs. Recall that a mutator set membership proof contains a membership proof for every index showing that it lives in SWBF. If the index lives in the active window and the active window is represented explicitly, then this SWBF-membership-proof is empty. If the index lives in the inactive part, then this SWBF-membership-proof is an MMR membership proof. But the point is that currently, whenever an addition record is applied, all indices are guaranteed to live in the active window so the production of SWBF-membership-proofs is guaranteed to succeed without supplying supplementary witnesses. In the proposed refactor, you need to supply supplementary witnesses that defeat the purpose of Merkleization.

On top of that, managing these supplementary witnesses in a blockchain client, when they are not part of the block, is quite a hassle.