cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.24k stars 3.61k forks source link

Analyze and improve store/v2 performance #11328

Closed roysc closed 2 years ago

roysc commented 2 years ago

Summary

Analyze the performance in store/v2 MultiStore and smt.Store, and identify the causes of inefficiencies and bottlenecks, and resolve them.

Problem Definition

According to the store benchmarks, the v2 MultiStore is performing slower than the existing IAVL-based MultiStore. Performance was one of the main concerns motivating ADR-40, so it's essential that we find a way to improve on this.

Proposal


For Admin Use

roysc commented 2 years ago

Added references to child issues:

roysc commented 2 years ago

Analysis

Both reads and updates are overall significantly slower for the SMT-based stores vs. IAVL.

Using benchstat we can compare the overall performance under different scenarios. In all the following tests, v1 refers to the IAVL-based KV store and multistore, and v2 to the SMT-based KV store and multistore.

In this suite we test pure get and set operations in two cases - with keys that all exist in the store (present), and keys that all do not (absent). Note that no deletes or commits are performed. getpast performs reads at an earlier version.

Comparison of benchmarks run on v1 ("old time") and v2 ("new time") using benchstat

name                            old time/op  new time/op    delta
KVStore/memdb/get-present-4     1.52ms ±12%    2.75ms ± 8%    +81.36%  (p=0.000 n=18+18)
KVStore/memdb/get-absent-4      1.65ms ± 4%    4.15ms ±67%   +151.91%  (p=0.000 n=16+20)
KVStore/memdb/set-present-4     3.41ms ± 3%  174.08ms ±43%  +5001.40%  (p=0.000 n=18+20)
KVStore/memdb/set-absent-4      4.54ms ±24%  169.74ms ±14%  +3638.43%  (p=0.000 n=18+20)
MultiStore/memdb-getpast-4      14.6ms ± 6%     5.2ms ±63%    -64.44%  (p=0.000 n=17+19)
MultiStore/memdb/get-present-4  1.55ms ± 3%    2.66ms ± 6%    +71.80%  (p=0.000 n=18+18)
MultiStore/memdb/get-absent-4   1.73ms ± 4%    2.79ms ±19%    +61.17%  (p=0.000 n=18+20)
MultiStore/memdb/set-present-4  3.69ms ±12%  201.79ms ±21%  +5362.11%  (p=0.000 n=17+20)
MultiStore/memdb/set-absent-4   4.56ms ± 7%  203.86ms ±12%  +4372.38%  (p=0.000 n=18+20)

Reads

One of the motivations of ADR-40 was to improve performance by introducing a store design that required no tree traversal on reads. The v2 store maps values directly in the backing KV database, so reads do not have to go through the SMT structure. However, reads are still slower in v2 (mostly; see below).

Factors to note:

The effect of this design difference on performance is made clear in cases where IAVL does not cache nodes, which includes the getpast case. This is the only case above in which v2 outperforms v1, which is explained by the fact that IAVL must read from the DB on each node access.

We can highlight the difference even more by reloading the store before testing - now, v2 is faster for all get operations.

KVStore/memdb/get-present-4     38.2ms ± 7%     2.7ms ±13%    -92.93%  (p=0.000 n=15+13)
KVStore/memdb/get-absent-4      37.3ms ± 7%     3.0ms ± 4%    -92.05%  (p=0.000 n=14+14)
KVStore/memdb/set-present-4     12.0ms ±45%   159.6ms ±16%  +1234.56%  (p=0.000 n=15+15)
KVStore/memdb/set-absent-4      8.39ms ±26%  165.88ms ±17%  +1878.30%  (p=0.000 n=15+15)
MultiStore/memdb-getpast-4      15.4ms ±11%     4.0ms ±48%    -73.71%  (p=0.000 n=12+15)
MultiStore/memdb/get-present-4  12.3ms ±45%     2.6ms ± 7%    -79.21%  (p=0.000 n=14+14)
MultiStore/memdb/get-absent-4   6.65ms ±23%    2.72ms ±22%    -59.10%  (p=0.000 n=15+14)
MultiStore/memdb/set-present-4  7.16ms ±24%  203.57ms ±19%  +2742.92%  (p=0.000 n=15+15)
MultiStore/memdb/set-absent-4   6.21ms ±24%  205.42ms ±14%  +3210.47%  (p=0.000 n=15+15)

Writes

The difference in performance for updates in v2 is much more dramatic, with a slowdown of more than an order of magnitude (with memdb).

Notes

These benchmarks were run on this branch, which is rebased on the latest v2/Multistore code: https://github.com/vulcanize/cosmos-sdk/tree/roysc/store_benchmarks/store. Run using this script: https://gist.github.com/roysc/45c0ba7631b9ba15d2838ece282d0183

I can provide more data or info as needed. We can also begin discussing solutions here.

i-norden commented 2 years ago

The aforementioned cache wrap for the SMT: https://github.com/vulcanize/smt/blob/cache_listener_wrap/cache_wrapped_mapstore.go (note - branch at https://github.com/roysc/smt/tree/cache_wrap patched to work with benchmarks - Roy)

Note that we need this for ADR-052 so performance was not the main motivation for the cache wrap (but it is nice to see that it not only didn't have a negative impact, but a slightly positive one :D). The reason the impact is not more significant is because we are already doing caching at the DB transaction level (since, in the case of the SDK, the MapStore backing the SMT is a DB transaction object).

It should also be noted that the SMT implementation maintains its own mapping of hash(key) => value that enables it to support key lookups without having to perform a tree traversals (the values mapstore here: https://github.com/celestiaorg/smt/blob/master/smt.go#L21). This is ofc a necessary feature for a general SMT implementation, but it is unused in the context of the SDK since the SDK maintains its own mapping of key => value at the state store level (B1 bucket). So, another optimization would be to remove this mapping maintained at the SMT level. We don't know yet how significant of an impact this might have on time/op but it would remove one DB operation for every KVStore Set/Delete and would remove the duplication of every value on disk (considering how large values can be this should be a significant disk usage optimization).