dfinity / canister-profiling

Collection of canister performance benchmarks
Apache License 2.0
21 stars 8 forks source link

switch gc for collection benchmarks #71

Closed chenyan-dfinity closed 11 months ago

github-actions[bot] commented 1 year ago

Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 136_413 ($\textcolor{red}{1.93\%}$) 7_606_067_149 ($\textcolor{red}{9.28\%}$) 61_987_708 288_199 ($\textcolor{red}{0.25\%}$) 6_162_750_199 ($\textcolor{red}{11.73\%}$) 309_738 ($\textcolor{red}{0.25\%}$)
triemap 137_903 ($\textcolor{red}{1.91\%}$) 12_189_110_689 ($\textcolor{red}{6.63\%}$) 74_215_992 222_822 ($\textcolor{red}{0.02\%}$) 547_698 ($\textcolor{red}{0.01\%}$) 539_062 ($\textcolor{red}{0.01\%}$)
rbtree 138_702 ($\textcolor{red}{1.90\%}$) 6_610_828_306 ($\textcolor{red}{10.56\%}$) 57_995_880 88_902 ($\textcolor{red}{0.00\%}$) 268_570 ($\textcolor{red}{0.00\%}$) 278_336 ($\textcolor{red}{0.00\%}$)
splay 134_438 ($\textcolor{red}{1.95\%}$) 12_359_483_042 ($\textcolor{red}{6.84\%}$) 53_995_816 551_923 ($\textcolor{red}{0.00\%}$) 581_661 ($\textcolor{red}{0.00\%}$) 810_217 ($\textcolor{red}{0.00\%}$)
btree 178_981 ($\textcolor{red}{1.43\%}$) 8_510_551_526 ($\textcolor{red}{3.48\%}$) 31_103_868 277_539 ($\textcolor{red}{0.00\%}$) 384_168 ($\textcolor{red}{0.00\%}$) 429_038 ($\textcolor{red}{0.00\%}$)
zhenya_hashmap 144_414 ($\textcolor{red}{1.80\%}$) 3_460_663_105 ($\textcolor{red}{25.56\%}$) 65_987_456 68_446 ($\textcolor{red}{0.08\%}$) 83_148 ($\textcolor{red}{0.06\%}$) 93_334 ($\textcolor{red}{0.07\%}$)
btreemap_rs 413_478 1_649_709_879 13_762_560 66_814 112_263 81_263
imrc_hashmap_rs 413_588 2_385_702_121 122_454_016 32_846 162_715 98_494
hashmap_rs 406_096 392_593_368 36_536_320 16_498 20_863 19_973

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50
heap 130_340 ($\textcolor{red}{2.03\%}$) 4_972_973_442 ($\textcolor{red}{6.16\%}$) 29_995_812 511_496 ($\textcolor{red}{0.00\%}$) 186_462 ($\textcolor{red}{0.00\%}$)
heap_rs 403_925 123_102_482 9_109_504 53_320 18_138

Statistics

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 225_805 37_493 ($\textcolor{red}{0.06\%}$) 16_270 ($\textcolor{red}{0.26\%}$) 12_654 ($\textcolor{green}{-0.38\%}$) 14_126 ($\textcolor{green}{-0.21\%}$)
Rust 704_886 471_865 86_470 104_617 115_765

DIP721 NFT

Note Same as main branch, skipping.

Statistics

Heartbeat

binary_size heartbeat
Motoko 118_909 7_392
Rust 23_699 474 ($\textcolor{green}{-40.08\%}$)

Timer

Note Same as main branch, skipping.

Statistics

github-actions[bot] commented 1 year ago

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

💎 Takeaways

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • btree comes from Byron Becker's stable BTreeMap library.
  • zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.
  • The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 136_413 7_606_067_149 61_987_708 288_199 6_162_750_199 309_738
triemap 137_903 12_189_110_689 74_215_992 222_822 547_698 539_062
rbtree 138_702 6_610_828_306 57_995_880 88_902 268_570 278_336
splay 134_438 12_359_483_042 53_995_816 551_923 581_661 810_217
btree 178_981 8_510_551_526 31_103_868 277_539 384_168 429_038
zhenya_hashmap 144_414 3_460_663_105 65_987_456 68_446 83_148 93_334
btreemap_rs 413_478 1_649_709_879 13_762_560 66_814 112_263 81_263
imrc_hashmap_rs 413_588 2_385_702_121 122_454_016 32_846 162_715 98_494
hashmap_rs 406_096 392_593_368 36_536_320 16_498 20_863 19_973

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50
heap 130_340 4_972_973_442 29_995_812 511_496 186_462 487_203
heap_rs 403_925 123_102_482 9_109_504 53_320 18_138 53_543

Sample Dapps

Measure the performance of some typical dapps:

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 225_805 37_493 16_270 12_654 14_126
Rust 704_886 471_865 86_470 104_617 115_765

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 183_882 12_181 22_319 4_710
Rust 766_710 125_034 324_482 77_116

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

Heartbeat

binary_size heartbeat
Motoko 118_909 7_392
Rust 23_699 474

Timer

binary_size setTimer cancelTimer
Motoko 125_168 15_208 1_679
Rust 434_848 43_540 7_683