dfinity / canister-profiling

Collection of canister performance benchmarks
Apache License 2.0
21 stars 8 forks source link

A more optimized stable B-Tree (with new metering) #97

Open chenyan-dfinity opened 1 year ago

github-actions[bot] commented 1 year ago

Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
hashmap 160_146 ($\textcolor{green}{-0.05\%}$) 8_415_853_637 ($\textcolor{red}{20.50\%}$) 61_987_852 344_668 ($\textcolor{red}{19.40\%}$) 6_554_314_393 ($\textcolor{red}{18.38\%}$) 372_027 ($\textcolor{red}{19.93\%}$) 11_086_889_804 ($\textcolor{red}{21.45\%}$)
triemap 163_317 ($\textcolor{green}{-0.10\%}$) 14_814_581_233 ($\textcolor{red}{29.23\%}$) 74_216_172 291_744 ($\textcolor{red}{30.87\%}$) 714_312 ($\textcolor{red}{30.01\%}$) 700_163 ($\textcolor{red}{29.61\%}$) 16_814_045_622 ($\textcolor{red}{28.60\%}$)
rbtree 157_922 ($\textcolor{green}{-0.14\%}$) 7_517_002_376 ($\textcolor{red}{25.72\%}$) 57_996_060 113_955 ($\textcolor{red}{28.18\%}$) 339_364 ($\textcolor{red}{26.36\%}$) 362_212 ($\textcolor{red}{30.13\%}$) 7_125_606_793 ($\textcolor{red}{23.45\%}$)
splay 159_759 ($\textcolor{green}{-0.12\%}$) 15_011_765_466 ($\textcolor{red}{29.77\%}$) 53_995_996 719_957 ($\textcolor{red}{30.42\%}$) 758_215 ($\textcolor{red}{30.33\%}$) 1_051_199 ($\textcolor{red}{29.73\%}$) 4_732_822_678 ($\textcolor{red}{27.14\%}$)
btree 187_666 ($\textcolor{green}{-0.12\%}$) 10_705_101_825 ($\textcolor{red}{30.17\%}$) 31_104_012 365_187 ($\textcolor{red}{31.58\%}$) 503_327 ($\textcolor{red}{31.02\%}$) 559_436 ($\textcolor{red}{30.39\%}$) 3_120_410_218 ($\textcolor{red}{23.93\%}$)
zhenya_hashmap 160_516 ($\textcolor{red}{0.00\%}$) 2_776_814_274 ($\textcolor{red}{26.13\%}$) 22_773_100 64_709 ($\textcolor{red}{33.07\%}$) 82_032 ($\textcolor{red}{32.65\%}$) 94_090 ($\textcolor{red}{32.76\%}$) 3_319_787_035 ($\textcolor{red}{23.16\%}$)
btreemap_rs 477_037 ($\textcolor{green}{-0.12\%}$) 1_793_694_183 ($\textcolor{red}{8.60\%}$) 27_590_656 75_427 ($\textcolor{red}{12.81\%}$) 125_068 ($\textcolor{red}{11.19\%}$) 86_003 ($\textcolor{red}{12.81\%}$) 2_927_540_279 ($\textcolor{red}{10.02\%}$)
imrc_hashmap_rs 479_206 ($\textcolor{green}{-0.12\%}$) 2_580_421_602 ($\textcolor{red}{7.84\%}$) 244_973_568 38_236 ($\textcolor{red}{16.70\%}$) 178_903 ($\textcolor{red}{9.59\%}$) 114_251 ($\textcolor{red}{16.12\%}$) 5_754_488_547 ($\textcolor{red}{10.84\%}$)
hashmap_rs 467_420 ($\textcolor{green}{-0.12\%}$) 432_765_549 ($\textcolor{red}{7.31\%}$) 73_138_176 21_704 ($\textcolor{red}{28.80\%}$) 26_758 ($\textcolor{red}{23.42\%}$) 25_073 ($\textcolor{red}{23.74\%}$) 1_280_477_959 ($\textcolor{red}{11.85\%}$)

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50 pop_min 50.1 upgrade
heap 147_633 ($\textcolor{green}{-0.00\%}$) 6_206_169_472 ($\textcolor{red}{32.48\%}$) 29_995_956 680_841 ($\textcolor{red}{33.11\%}$) 250_925 ($\textcolor{red}{34.57\%}$) 648_731 ($\textcolor{red}{33.15\%}$) 3_293_282_643 ($\textcolor{red}{24.01\%}$)
heap_rs 463_292 ($\textcolor{green}{-0.12\%}$) 138_669_631 ($\textcolor{red}{14.04\%}$) 18_284_544 57_362 ($\textcolor{red}{11.04\%}$) 23_196 ($\textcolor{red}{27.14\%}$) 57_496 ($\textcolor{red}{10.99\%}$) 505_960_753 ($\textcolor{red}{14.80\%}$)

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500 upgrade
buffer 150_978 ($\textcolor{green}{-0.02\%}$) 2_718_410 ($\textcolor{red}{30.53\%}$) 65_644 104_997 ($\textcolor{red}{43.65\%}$) 858_376 ($\textcolor{red}{27.83\%}$) 172_497 ($\textcolor{red}{35.19\%}$) 3_141_451 ($\textcolor{red}{26.95\%}$)
vector 152_598 ($\textcolor{red}{0.03\%}$) 2_131_660 ($\textcolor{red}{34.21\%}$) 24_580 141_137 ($\textcolor{red}{34.17\%}$) 203_207 ($\textcolor{red}{35.53\%}$) 195_769 ($\textcolor{red}{32.19\%}$) 4_756_048 ($\textcolor{red}{23.71\%}$)
vec_rs 459_107 ($\textcolor{green}{-0.12\%}$) 288_898 ($\textcolor{red}{8.74\%}$) 1_310_720 17_148 ($\textcolor{red}{31.77\%}$) 30_473 ($\textcolor{red}{20.15\%}$) 25_609 ($\textcolor{red}{20.53\%}$) 3_141_188 ($\textcolor{red}{14.48\%}$)

Warning Skip table 3 ## Stable structures from _out/collections/README.md, due to table shape mismatches from main branch.

Statistics

github-actions[bot] commented 1 year ago

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko. The _stable and _stable_rs suffix represents that the library directly writes the state to stable memory using Region in Motoko and ic-stable-stuctures in Rust.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

💎 Takeaways

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with very large collections.
  • The upgrade column uses Candid for serializing stable data. In Rust, you may get better cycle cost by using a different serialization format. Another slowdown in Rust is that ic-stable-structures tends to be slower than the region memory in Motoko.
  • Different library has different ways for persisting data during upgrades, there are mainly three categories:
    • Use stable variable directly in Motoko: zhenya_hashmap, btree, vector
    • Expose and serialize external state (share/unshare in Motoko, candid::Encode in Rust): rbtree, heap, btreemap_rs, hashmap_rs, heap_rs, vector_rs
    • Use pre/post-upgrade hooks to convert data into an array: hashmap, splay, triemap, buffer, imrc_hashmap_rs
  • The stable benchmarks are much more expensive than their non-stable counterpart, because the stable memory API is much more expensive. The benefit is that they get fast upgrade. The upgrade still needs to parse the metadata when initializing the upgraded Wasm module.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • btree comes from mops.one/stableheapbtreemap.
  • btree_stable comes from github.com/sardariuss.
  • zhenya_hashmap comes from mops.one/map.
  • vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
hashmap 160_146 8_415_853_637 61_987_852 344_668 6_554_314_393 372_027 11_086_889_804
triemap 163_317 14_814_581_233 74_216_172 291_744 714_312 700_163 16_814_045_622
rbtree 157_922 7_517_002_376 57_996_060 113_955 339_364 362_212 7_125_606_793
splay 159_759 15_011_765_466 53_995_996 719_957 758_215 1_051_199 4_732_822_678
btree 187_666 10_705_101_825 31_104_012 365_187 503_327 559_436 3_120_410_218
zhenya_hashmap 160_516 2_776_814_274 22_773_100 64_709 82_032 94_090 3_319_787_035
btreemap_rs 477_037 1_793_694_183 27_590_656 75_427 125_068 86_003 2_927_540_279
imrc_hashmap_rs 479_206 2_580_421_602 244_973_568 38_236 178_903 114_251 5_754_488_547
hashmap_rs 467_420 432_765_549 73_138_176 21_704 26_758 25_073 1_280_477_959

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50 pop_min 50 upgrade
heap 147_633 6_206_169_472 29_995_956 680_841 250_925 648_731 3_293_282_643
heap_rs 463_292 138_669_631 18_284_544 57_362 23_196 57_496 505_960_753

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500 upgrade
buffer 150_978 2_718_410 65_644 104_997 858_376 172_497 3_141_451
vector 152_598 2_131_660 24_580 141_137 203_207 195_769 4_756_048
vec_rs 459_107 288_898 1_310_720 17_148 30_473 25_609 3_141_188

Stable structures

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
btree 187_666 457_750_863 1_554_152 289_410 442_003 479_799 155_920_331
btree_stable 205_154 9_248_256_252 2_621_440 6_661_072 10_173_720 19_236_694 31_845
btreemap_rs 477_037 76_218_029 2_555_904 64_985 97_003 84_994 125_790_335
btreemap_stable_rs 478_103 4_751_143_101 2_621_440 2_844_949 5_181_254 8_809_593 731_069
heap_rs 463_292 7_001_559 2_293_760 49_871 23_444 49_845 26_519_303
heap_stable_rs 450_480 317_932_953 458_752 2_667_588 276_814 2_647_531 731_182
vec_rs 459_107 3_079_223 2_228_224 17_148 18_323 17_850 24_471_685
vec_stable_rs 445_471 74_445_158 458_752 69_489 90_685 92_681 731_195

Environment

  • dfx 0.15.1
  • Motoko compiler 0.10.0 (source a3ywvw0a-p5a03qy6-vscbl9j8-qxszbxa6)
  • rustc 1.73.0 (cc66ad468 2023-10-03)
  • ic-repl 0.5.1
  • ic-wasm 0.6.0