dfinity / canister-profiling

Collection of canister performance benchmarks
Apache License 2.0
21 stars 8 forks source link

A more optimized Stable B-Tree (main, unoptimized branch) #94

Open crusso opened 11 months ago

crusso commented 11 months ago

like #90 but using an optimized implementation

github-actions[bot] commented 11 months ago

Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Map

Note Same as main branch, skipping.

Priority queue

Note Same as main branch, skipping.

Growable array

Note Same as main branch, skipping.

Warning Skip table 3 ## Stable structures from _out/collections/README.md, due to table shape mismatches from main branch.

Statistics

github-actions[bot] commented 11 months ago

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko. The _stable and _stable_rs suffix represents that the library directly writes the state to stable memory using Region in Motoko and ic-stable-stuctures in Rust.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

💎 Takeaways

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with very large collections.
  • The upgrade column uses Candid for serializing stable data. In Rust, you may get better cycle cost by using a different serialization format. Another slowdown in Rust is that ic-stable-structures tends to be slower than the region memory in Motoko.
  • Different library has different ways for persisting data during upgrades, there are mainly three categories:
    • Use stable variable directly in Motoko: zhenya_hashmap, btree, vector
    • Expose and serialize external state (share/unshare in Motoko, candid::Encode in Rust): rbtree, heap, btreemap_rs, hashmap_rs, heap_rs, vector_rs
    • Use pre/post-upgrade hooks to convert data into an array: hashmap, splay, triemap, buffer, imrc_hashmap_rs
  • The stable benchmarks are much more expensive than their non-stable counterpart, because the stable memory API is much more expensive. The benefit is that they get fast upgrade. The upgrade still needs to parse the metadata when initializing the upgraded Wasm module.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • btree comes from mops.one/stableheapbtreemap.
  • btree_stable comes from github.com/sardariuss.
  • zhenya_hashmap comes from mops.one/map.
  • vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
hashmap 160_221 6_984_044_999 61_987_852 288_670 5_536_856_410 310_195 9_128_784_003
triemap 163_474 11_463_655_150 74_216_172 222_926 549_435 540_205 13_075_158_546
rbtree 158_149 5_979_229_900 57_996_060 88_905 268_573 278_352 5_771_880_608
splay 159_956 11_568_250_103 53_995_996 552_014 581_765 810_321 3_722_474_749
btree 187_897 8_224_242_789 31_104_012 277_542 384_171 429_041 2_517_941_583
zhenya_hashmap 160_509 2_201_622_562 22_773_100 48_627 61_839 70_872 2_695_448_620
btreemap_rs 477_612 1_651_590_463 27_590_656 66_862 112_477 76_234 2_660_975_747
imrc_hashmap_rs 479_773 2_392_906_831 244_973_568 32_763 163_245 98_394 5_191_575_323
hashmap_rs 467_997 403_296_648 73_138_176 16_851 21_680 20_263 1_144_828_025

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50 pop_min 50 upgrade
heap 147_638 4_684_519_403 29_995_956 511_505 186_471 487_225 2_655_609_909
heap_rs 463_840 121_602_221 18_284_544 51_661 18_245 51_802 440_739_988

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500 upgrade
buffer 151_004 2_082_623 65_644 73_092 671_517 127_592 2_474_639
vector 152_551 1_588_260 24_580 105_191 149_932 148_094 3_844_445
vec_rs 459_655 265_683 1_310_720 13_014 25_363 21_247 2_743_831

Stable structures

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
btree 187_897 351_889_189 1_554_152 219_328 337_463 368_143 125_813_601
btree_stable 205_629 13_063_523_594 2_621_440 10_582_502 14_457_388 28_489_728 31_540
btreemap_rs 477_612 70_026_986 2_555_904 57_181 86_494 75_309 113_837_931
btreemap_stable_rs 478_668 4_224_209_849 2_621_440 2_528_769 4_605_548 7_817_380 653_359
heap_rs 463_840 6_139_838 2_293_760 44_362 18_477 44_345 23_149_372
heap_stable_rs 451_018 279_422_369 458_752 2_346_843 241_158 2_329_183 653_433
vec_rs 459_655 2_866_886 2_228_224 13_014 14_113 13_710 21_249_908
vec_stable_rs 446_031 65_186_210 458_752 58_992 77_387 79_383 653_447

Environment

  • dfx 0.15.1
  • Motoko compiler 0.10.0 (source a3ywvw0a-p5a03qy6-vscbl9j8-qxszbxa6)
  • rustc 1.73.0 (cc66ad468 2023-10-03)
  • ic-repl 0.5.1
  • ic-wasm 0.6.0