Benchmark different appending schemes

adlerjohn commented 3 years ago

Currently, the SMT appends new side nodes to a slice at the end:

sideNodes = append(sideNodes, sideNode)
// ...
return reverseSideNodes(sideNodes)

This requires two allocations (one initially and one for the reverse slice) and one copy (to copy to the reverse slice). Alternatives would be to insert the new node at the front,

sideNodes = append(sideNodes, nil)
copy(sideNodes[1:], sideNodes)
sideNodes[0] = sideNode

or to create a new slice and append the old one to the new one

sideNodes = append([][]byte{sideNode}, sideNodes...)

Both these alternatives should result in a new allocation and copy for every insertion, and should result in an equal number of allocations and copies. However, this SO answer suggests the second alternative would result in fewer allocations from the first, but this is most likely an erroneous interpretation that the second alternative would result in fewer allocations than another, less efficient, alternative (not shown here).

The current scheme and two alternatives should be benchmarked, as generating a list of side nodes is along the hot path of any write or prove operation.

liamsi commented 3 years ago

It would be good if those benchmarks were summarized using: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat

liamsi commented 3 years ago

BTW, I tried the 2nd option and it does not yield any significant improvement. I makes sense as it's already ensured that the slices are allocated with the correct capacity directly: https://github.com/celestiaorg/smt/blob/d95f5f42e254b8e79e09de7da97e08795a003a94/smt.go#L317 and that reversion happens in place: https://github.com/celestiaorg/smt/blob/d95f5f42e254b8e79e09de7da97e08795a003a94/utils.go#L45-L47

The 3rd approach significantly makes things much worse:

 benchstat old.txt new.txt                                                                                                                          23:24:31
name                        old time/op    new time/op    delta
SparseMerkleTree_Update-12    24.6µs ±17%    30.6µs ± 6%   +24.34%  (p=0.000 n=10+10)
SparseMerkleTree_Delete-12    28.9µs ±34%    30.9µs ±13%      ~     (p=0.447 n=10+9)

name                        old alloc/op   new alloc/op   delta
SparseMerkleTree_Update-12    15.8kB ± 0%    22.1kB ± 2%   +39.68%  (p=0.000 n=8+10)
SparseMerkleTree_Delete-12    15.0kB ± 0%    24.5kB ± 0%   +62.93%  (p=0.000 n=10+10)

name                        old allocs/op  new allocs/op  delta
SparseMerkleTree_Update-12      58.7 ± 1%     112.2 ± 3%   +91.25%  (p=0.000 n=9+10)
SparseMerkleTree_Delete-12      53.0 ± 0%     122.4 ± 0%  +130.94%  (p=0.000 n=10+10)

tldr: we should probably just close this issue.

liamsi commented 3 years ago

For the 3rd approach see: https://dashboard.github.orijtech.com/benchmark/2b73c6e366194dab831ec08c84ea8043 For the 2nd see: https://dashboard.github.orijtech.com/benchmark/745bbb8971f746c4a48f3acbcf68e7a6

~~Maybe our benchmarks are not clever enough and we should instead benchmark a bulk update instead of (repeatedly a single one):~~ https://github.com/celestiaorg/smt/blob/d95f5f42e254b8e79e09de7da97e08795a003a94/bench_test.go#L17

Nah, tried that as well.

celestiaorg / smt

Benchmark different appending schemes #30