dfinity / canister-profiling

Collection of canister performance benchmarks
Apache License 2.0
21 stars 8 forks source link

scale collection to 1M #70

Closed chenyan-dfinity closed 1 year ago

chenyan-dfinity commented 1 year ago
github-actions[bot] commented 1 year ago

Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Warning Skip _out/collections/README.md, due to the number of tables mismatches from main branch.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 225_805 37_469 ($\textcolor{green}{-0.06\%}$) 16_274 ($\textcolor{red}{0.02\%}$) 12_658 ($\textcolor{red}{0.02\%}$) 14_106 ($\textcolor{red}{0.01\%}$)
Rust 704_886 471_865 86_470 104_617 115_765

DIP721 NFT

Note Same as main branch, skipping.

Statistics

Heartbeat

binary_size heartbeat
Motoko 118_909 7_392
Rust 23_699 474 ($\textcolor{green}{-40.08\%}$)

Timer

Note Same as main branch, skipping.

Statistics

Actor class

Note Same as main branch, skipping.

Statistics

Publisher & Subscriber

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 139_886 126_827 14_641 ($\textcolor{red}{0.06\%}$) 8_451 10_530 3_662
Rust 472_135 519_916 51_591 34_661 74_169 41_615

Statistics

github-actions[bot] commented 1 year ago

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

💎 Takeaways

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • btree comes from Byron Becker's stable BTreeMap library.
  • zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.
  • The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 133_828 6_960_077_358 61_987_708 287_469 5_515_887_135 308_972
triemap 135_316 11_431_084_368 74_215_992 222_768 547_650 538_998
rbtree 136_114 5_979_229_531 57_995_880 88_900 268_568 278_334
splay 131_868 11_568_250_397 53_995_816 551_921 581_659 810_215
btree 176_459 8_224_241_532 31_103_868 277_537 384_166 429_036
zhenya_hashmap 141_855 2_756_209_728 65_987_456 68_392 83_100 93_270
btreemap_rs 413_478 1_649_709_879 13_762_560 66_814 112_263 81_263
imrc_hashmap_rs 413_588 2_385_702_121 122_454_016 32_846 162_715 98_494
hashmap_rs 406_096 392_593_368 36_536_320 16_498 20_863 19_973

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50
heap 127_748 4_684_517_789 29_995_812 511_494 186_460 487_201
heap_rs 403_925 123_102_482 9_109_504 53_320 18_138 53_543

Sample Dapps

Measure the performance of some typical dapps:

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 225_805 37_469 16_274 12_658 14_106
Rust 704_886 471_865 86_470 104_617 115_765

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 183_882 12_181 22_319 4_710
Rust 766_710 125_034 324_482 77_116

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

Heartbeat

binary_size heartbeat
Motoko 118_909 7_392
Rust 23_699 474

Timer

binary_size setTimer cancelTimer
Motoko 125_168 15_208 1_679
Rust 434_848 43_540 7_683

Motoko Specific Benchmarks

Measure various features only available in Motoko.

Garbage Collection

generate 800k max mem batch_get 50 batch_put 50 batch_remove 50
default 1_012_258_524 59_396_752 50 50 50
copying 1_012_258_474 59_396_752 1_012_236_033 1_012_303_043 1_012_240_270
compacting 1_675_009_912 59_396_752 1_292_955_487 1_532_273_628 1_558_502_973
generational 2_517_025_054 59_397_016 977_578_942 1_052_786 967_410
incremental 32_320_741 4_624 290_257_785 292_951_006 292_977_552

Actor class

binary size put new bucket put existing bucket get
Map 254_076 638_613 4_449 4_909

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 139_886 126_827 14_641 8_451 10_530 3_662
Rust 472_135 519_916 51_591 34_661 74_169 41_615