Closed roysc closed 2 years ago
Added references to child issues:
Both reads and updates are overall significantly slower for the SMT-based stores vs. IAVL.
Using benchstat
we can compare the overall performance under different scenarios. In all the following tests, v1
refers to the IAVL-based KV store and multistore, and v2
to the SMT-based KV store and multistore.
In this suite we test pure get
and set
operations in two cases - with keys that all exist in the store (present
), and keys that all do not (absent
). Note that no deletes or commits are performed. getpast
performs reads at an earlier version.
Comparison of benchmarks run on v1 ("old time") and v2 ("new time") using benchstat
name old time/op new time/op delta
KVStore/memdb/get-present-4 1.52ms ±12% 2.75ms ± 8% +81.36% (p=0.000 n=18+18)
KVStore/memdb/get-absent-4 1.65ms ± 4% 4.15ms ±67% +151.91% (p=0.000 n=16+20)
KVStore/memdb/set-present-4 3.41ms ± 3% 174.08ms ±43% +5001.40% (p=0.000 n=18+20)
KVStore/memdb/set-absent-4 4.54ms ±24% 169.74ms ±14% +3638.43% (p=0.000 n=18+20)
MultiStore/memdb-getpast-4 14.6ms ± 6% 5.2ms ±63% -64.44% (p=0.000 n=17+19)
MultiStore/memdb/get-present-4 1.55ms ± 3% 2.66ms ± 6% +71.80% (p=0.000 n=18+18)
MultiStore/memdb/get-absent-4 1.73ms ± 4% 2.79ms ±19% +61.17% (p=0.000 n=18+20)
MultiStore/memdb/set-present-4 3.69ms ±12% 201.79ms ±21% +5362.11% (p=0.000 n=17+20)
MultiStore/memdb/set-absent-4 4.56ms ± 7% 203.86ms ±12% +4372.38% (p=0.000 n=18+20)
One of the motivations of ADR-40 was to improve performance by introducing a store design that required no tree traversal on reads. The v2
store maps values directly in the backing KV database, so reads do not have to go through the SMT structure. However, reads are still slower in v2
(mostly; see below).
Factors to note:
nodedb
node cache)
memdb
(the fastest DB backend) is implemented with a btree, which requires traversal
The effect of this design difference on performance is made clear in cases where IAVL does not cache nodes, which includes the getpast
case. This is the only case above in which v2
outperforms v1
, which is explained by the fact that IAVL must read from the DB on each node access.
We can highlight the difference even more by reloading the store before testing - now, v2
is faster for all get
operations.
KVStore/memdb/get-present-4 38.2ms ± 7% 2.7ms ±13% -92.93% (p=0.000 n=15+13)
KVStore/memdb/get-absent-4 37.3ms ± 7% 3.0ms ± 4% -92.05% (p=0.000 n=14+14)
KVStore/memdb/set-present-4 12.0ms ±45% 159.6ms ±16% +1234.56% (p=0.000 n=15+15)
KVStore/memdb/set-absent-4 8.39ms ±26% 165.88ms ±17% +1878.30% (p=0.000 n=15+15)
MultiStore/memdb-getpast-4 15.4ms ±11% 4.0ms ±48% -73.71% (p=0.000 n=12+15)
MultiStore/memdb/get-present-4 12.3ms ±45% 2.6ms ± 7% -79.21% (p=0.000 n=14+14)
MultiStore/memdb/get-absent-4 6.65ms ±23% 2.72ms ±22% -59.10% (p=0.000 n=15+14)
MultiStore/memdb/set-present-4 7.16ms ±24% 203.57ms ±19% +2742.92% (p=0.000 n=15+15)
MultiStore/memdb/set-absent-4 6.21ms ±24% 205.42ms ±14% +3210.47% (p=0.000 n=15+15)
The difference in performance for updates in v2
is much more dramatic, with a slowdown of more than an order of magnitude (with memdb
).
MapStore
used within the SMT, which provided a small speedup (~10%)These benchmarks were run on this branch, which is rebased on the latest v2/Multistore code: https://github.com/vulcanize/cosmos-sdk/tree/roysc/store_benchmarks/store. Run using this script: https://gist.github.com/roysc/45c0ba7631b9ba15d2838ece282d0183
I can provide more data or info as needed. We can also begin discussing solutions here.
The aforementioned cache wrap for the SMT: https://github.com/vulcanize/smt/blob/cache_listener_wrap/cache_wrapped_mapstore.go (note - branch at https://github.com/roysc/smt/tree/cache_wrap patched to work with benchmarks - Roy)
Note that we need this for ADR-052 so performance was not the main motivation for the cache wrap (but it is nice to see that it not only didn't have a negative impact, but a slightly positive one :D). The reason the impact is not more significant is because we are already doing caching at the DB transaction level (since, in the case of the SDK, the MapStore
backing the SMT is a DB transaction object).
It should also be noted that the SMT implementation maintains its own mapping of hash(key) => value
that enables it to support key lookups without having to perform a tree traversals (the values
mapstore here: https://github.com/celestiaorg/smt/blob/master/smt.go#L21). This is ofc a necessary feature for a general SMT implementation, but it is unused in the context of the SDK since the SDK maintains its own mapping of key => value
at the state store level (B1 bucket). So, another optimization would be to remove this mapping maintained at the SMT level. We don't know yet how significant of an impact this might have on time/op
but it would remove one DB operation for every KVStore Set/Delete and would remove the duplication of every value on disk (considering how large values can be this should be a significant disk usage optimization).
Summary
Analyze the performance in store/v2
MultiStore
andsmt.Store
, and identify the causes of inefficiencies and bottlenecks, and resolve them.Problem Definition
According to the store benchmarks, the v2 MultiStore is performing slower than the existing IAVL-based MultiStore. Performance was one of the main concerns motivating ADR-40, so it's essential that we find a way to improve on this.
Proposal
For Admin Use