dfinity / canister-profiling

Collection of canister performance benchmarks
Apache License 2.0
21 stars 8 forks source link

add readme for each table #23

Closed chenyan-dfinity closed 1 year ago

chenyan-dfinity commented 1 year ago

Also more accurate memory measure in Rust

github-actions[bot] commented 1 year ago

Warning The flamegraph link only works after you merge.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

:gem: Takeaways

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • rbtree's remove method only performs logical removal of the elements. The removed elements still reside in memory, but not reachable from the map. A complete implementation of remove would cost a bit more than reported here.
  • The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 2_456_007_854 9_102_036 1_319_482 710_190_935 1_248_881
triemap 2_422_807_203 9_716_008 920_103 2_236_198 1_271_468
rbtree 2_322_981_525 10_102_164 844_569 2_120_544 998_470
splay 2_528_971_511 9_302_108 1_450_431 2_398_808 1_449_735
btreemap_rs 117_438_673 1_638_400 59_553 140_296 61_987
hashmap_rs 53_134_681 1_835_008 21_202 63_626 22_619

Priority queue

heapify 50k mem pop_min 50 put 50
heap 814_736_870 1_400_024 483_010 862_326 485_101
heap_rs 5_042_128 819_200 53_302 22_310 53_569

MoVM

generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 491_272_681 1_820_828 1_317_551 143_127_756 1_245_245
hashmap_rs 10_944_969 950_272 20_517 62_932 21_509
imrc_hashmap_rs 19_673_359 1_572_864 29_947 119_606 36_714
movm_rs 928_887_381 2_654_208 2_440_597 5_821_511 4_799_209
movm_dynamic_rs 485_461_977 2_129_920 1_872_262 2_654_951 1_831_888

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

subscribe publish
Motoko caller (20_075) / callee (6_291) caller (16_044) / callee (3_947)
Rust caller (72_344) / callee (56_608) caller (120_527) / callee (69_257)

Heartbeat

Measure the cost of empty heartbeat method.

heartbeat
Motoko 6_161
Rust 587

Motoko Garbage Collection

Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_live_size after generate call.

generate 80k max mem batch_get 50 batch_put 50 batch_remove 50
default 3_905_834_559 15_539_984 926_657 2_258_879 1_299_907
copying 3_905_834_509 15_539_984 266_740_517 268_236_256 267_277_400
compacting 4_052_409_826 15_539_984 308_612_525 350_383_547 353_311_950
generational 4_276_930_061 15_540_260 987_302 3_659_583 2_369_738

Basic DAO

Measure the performance of a typical dapp Basic DAO, with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.
init transfer_token submit_proposal vote_proposal
Motoko 46_175 21_562 15_121 17_974
Rust 1_079_829 181_395 176_823 239_656