Open aszepieniec opened 3 months ago
Hey all, could I take a look at this for my 2nd PR?
Absolutely!
Flagging @Sword-Smith because he knows more about efficient MMR arithmetic than I do.
I'm not a fan of the call to get_peaks_with_heights_async
, as is reads from disk needlessly. It's only used in the get_peaks
method on ArchivalMmr
, so let's get rid of get_peaks_with_heights_async
completely.
It can be replaced by a call to node_index_to_leaf_index
in twenty-first
followed by a call to get_peak_heights_and_peak_node_indices
, also in twenty-first
. Then you have the node-indices of the MMR-peaks. The actual peak values can then be fetched by a get_many
method call on the digests
field on ArchivalMmr
.
This reduces IO significantly, and it gives us index arithmetic that runs in $log(N)$ time, $N$ being the number of leaves in the MMR. It does not quite give us $O(1)$ running time, but I don't really think the difference will be measurable, so let's just focus on minimizing IO for now. I don't actually know if constant-time index arithmetic would be possible here. Constant-time index arithmetic is really only something we need when the algorithms are running on Triton VM, and it only deals with MMR-accumulators, never with archival-MMRs, as this issue is about.
I added a benchmark such that you can easily track your progress. It's already on master
and can be run with cargo bench get_peaks
.
In particular, look at
get_peaks_with_heights_async
which currently uses the (out-dated) node-index and tree-traversal logic.