Closed adlerjohn closed 3 years ago
It would be good if those benchmarks were summarized using: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
BTW, I tried the 2nd option and it does not yield any significant improvement. I makes sense as it's already ensured that the slices are allocated with the correct capacity directly: https://github.com/celestiaorg/smt/blob/d95f5f42e254b8e79e09de7da97e08795a003a94/smt.go#L317 and that reversion happens in place: https://github.com/celestiaorg/smt/blob/d95f5f42e254b8e79e09de7da97e08795a003a94/utils.go#L45-L47
The 3rd approach significantly makes things much worse:
benchstat old.txt new.txt 23:24:31
name old time/op new time/op delta
SparseMerkleTree_Update-12 24.6µs ±17% 30.6µs ± 6% +24.34% (p=0.000 n=10+10)
SparseMerkleTree_Delete-12 28.9µs ±34% 30.9µs ±13% ~ (p=0.447 n=10+9)
name old alloc/op new alloc/op delta
SparseMerkleTree_Update-12 15.8kB ± 0% 22.1kB ± 2% +39.68% (p=0.000 n=8+10)
SparseMerkleTree_Delete-12 15.0kB ± 0% 24.5kB ± 0% +62.93% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
SparseMerkleTree_Update-12 58.7 ± 1% 112.2 ± 3% +91.25% (p=0.000 n=9+10)
SparseMerkleTree_Delete-12 53.0 ± 0% 122.4 ± 0% +130.94% (p=0.000 n=10+10)
tldr: we should probably just close this issue.
For the 3rd approach see: https://dashboard.github.orijtech.com/benchmark/2b73c6e366194dab831ec08c84ea8043 For the 2nd see: https://dashboard.github.orijtech.com/benchmark/745bbb8971f746c4a48f3acbcf68e7a6
Maybe our benchmarks are not clever enough and we should instead benchmark a bulk update instead of (repeatedly a single one):
https://github.com/celestiaorg/smt/blob/d95f5f42e254b8e79e09de7da97e08795a003a94/bench_test.go#L17
Nah, tried that as well.
Currently, the SMT appends new side nodes to a slice at the end:
This requires two allocations (one initially and one for the reverse slice) and one copy (to copy to the reverse slice). Alternatives would be to insert the new node at the front,
or to create a new slice and append the old one to the new one
Both these alternatives should result in a new allocation and copy for every insertion, and should result in an equal number of allocations and copies. However, this SO answer suggests the second alternative would result in fewer allocations from the first, but this is most likely an erroneous interpretation that the second alternative would result in fewer allocations than another, less efficient, alternative (not shown here).
The current scheme and two alternatives should be benchmarked, as generating a list of side nodes is along the hot path of any write or prove operation.