Neptune-Crypto / neptune-core

anonymous peer-to-peer cash
Apache License 2.0
25 stars 7 forks source link

fix: Flaky mutator set bug #106

Closed aszepieniec closed 7 months ago

aszepieniec commented 7 months ago

The problem was located in Block::accumulate_transaction, which is invoked repeatedly by a miner as he selects which mempool transactions to include in his block. Recall that the block has only one transaction, which represents the merger of all transactions of the miner's choice.

The problem is that the block-in-preparation contains the updated mutator set accumulator. When a new transaction is aggregated, the mutator set accumulator is updated from this intermediate state, with the new transaction's addition and removal records. It is not obvious why this leads to a problem, so let's use an example:

So how do we get them in sync? We can't use the archival mutator set because we might not be running an archival node. We can't look at the new mutator set accumulator because we might need the chunk of the SWBF that was slid away and is now hidden by Merkleization. We can't revert the removal of inputs_1 because we still need that slid chunk to restore the mutator set accumulator.

The solution in this PR is to modify the type signature of the function accumulate_transaction, which now takes an additional argument previous_mutator_set_accumulator. The new mutator set is computed by starting from this value and applying all addition records (outputs_1 || outputs_2) first and all removal records (inputs_1 || inputs_2) second. The halfway-in-between mutator set accumulator is ignored.

At some point we should consider a prettier solution at some point -- one that does not generate objects that are destined to be ignored. However, that sounds to me like an optimization with low priority and low payoff. It affects only miners (who need way more memory anyway) and it's not clear whether it can run any faster if it is implemented in this way.

This PR also:

  1. Drops the generic type argument H from all structs and functions associated with the mutator set. Throughout this code base we only ever use Tip5 which itself is already hidden behind type alias Hash. The genericity dates to the time where we needed an alternative to the relatively slow Rescue-Prime hash functino.
  2. Removes the return values of type Result<(),_> from functions where there is no control path leading to error. Accordingly catch-and-report-error clauses where these functions are called, are removed as well.
Sword-Smith commented 7 months ago

Very nice find. Well done! This bug highlights an underlying problem: we have tested the underlying MMR and mutator set implementations very thoroughly but not how we interact with those datastructures. A few solid property-based tests interacting with a mutator set would probably have revealed this bug.