dfinity / canister-profiling

Collection of canister performance benchmarks
Apache License 2.0
21 stars 8 forks source link

Enable wasm optimizer from `dfx 0.14.0` #55

Closed kentosugama closed 1 year ago

kentosugama commented 1 year ago

I think it would be good to merge this so that we can measure performance improvements beyond wasm-opt and not reimplement optimizations already included in the optimizer.

Note that these benchmarks directly useic-wasm instead of using the optimize: "cycles" feature in dfx in order to preserve the wasm name sections for the flame graphs. For any users reading this, for the general case we recommend using the optimizer through dfx instead as the binary size reductions will be better when dropping the name sections.

For future reference: https://github.com/dfinity/sdk/pull/3090 See also #50 for previous discussions.

github-actions[bot] commented 1 year ago

Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 169_982 ($\textcolor{green}{-13.37\%}$) 2_097_113_506 ($\textcolor{green}{-12.15\%}$) 9_102_052 1_115_399 ($\textcolor{green}{-13.74\%}$) 609_254_124 ($\textcolor{green}{-11.61\%}$) 1_056_869 ($\textcolor{green}{-13.70\%}$)
triemap 174_030 ($\textcolor{green}{-13.76\%}$) 2_020_134_416 ($\textcolor{green}{-11.65\%}$) 9_715_900 773_637 ($\textcolor{green}{-13.26\%}$) 1_853_794 ($\textcolor{green}{-12.21\%}$) 1_033_460 ($\textcolor{green}{-12.98\%}$)
rbtree 171_127 ($\textcolor{green}{-13.99\%}$) 1_797_995_532 ($\textcolor{green}{-11.20\%}$) 8_902_160 670_401 ($\textcolor{green}{-14.90\%}$) 1_623_975 ($\textcolor{green}{-11.70\%}$) 859_340 ($\textcolor{green}{-13.34\%}$)
splay 170_477 ($\textcolor{green}{-13.84\%}$) 2_040_395_523 ($\textcolor{green}{-11.50\%}$) 8_702_096 1_102_393 ($\textcolor{green}{-12.39\%}$) 1_915_542 ($\textcolor{green}{-11.93\%}$) 1_103_332 ($\textcolor{green}{-12.42\%}$)
btree 198_636 ($\textcolor{green}{-15.60\%}$) 1_875_401_612 ($\textcolor{green}{-11.63\%}$) 7_556_172 813_525 ($\textcolor{green}{-13.14\%}$) 1_718_273 ($\textcolor{green}{-12.11\%}$) 862_047 ($\textcolor{green}{-13.07\%}$)
zhenya_hashmap 165_325 ($\textcolor{green}{-13.20\%}$) 1_642_423_605 ($\textcolor{green}{-11.77\%}$) 9_301_800 647_832 ($\textcolor{green}{-13.50\%}$) 1_447_024 ($\textcolor{green}{-12.52\%}$) 652_030 ($\textcolor{green}{-13.63\%}$)
btreemap_rs 438_979 ($\textcolor{green}{-14.72\%}$) 112_676_543 ($\textcolor{green}{-2.86\%}$) 1_638_400 59_465 ($\textcolor{red}{0.05\%}$) 133_080 ($\textcolor{green}{-3.46\%}$) 60_509 ($\textcolor{green}{-2.08\%}$)
hashmap_rs 428_466 ($\textcolor{green}{-14.78\%}$) 49_363_168 ($\textcolor{green}{-7.45\%}$) 1_835_008 19_572 ($\textcolor{green}{-7.11\%}$) 58_237 ($\textcolor{green}{-8.43\%}$) 20_805 ($\textcolor{green}{-7.47\%}$)

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 156_998 ($\textcolor{green}{-13.61\%}$) 688_335_838 ($\textcolor{green}{-13.23\%}$) 1_400_024 338_619 ($\textcolor{green}{-12.12\%}$) 711_943 ($\textcolor{green}{-13.47\%}$)
heap_rs 406_219 ($\textcolor{green}{-14.20\%}$) 4_975_528 ($\textcolor{green}{-1.31\%}$) 819_200 48_902 ($\textcolor{green}{-8.15\%}$) 20_578 ($\textcolor{green}{-6.85\%}$)

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 169_982 ($\textcolor{green}{-13.37\%}$) 419_486_900 ($\textcolor{green}{-12.14\%}$) 1_820_844 1_113_679 ($\textcolor{green}{-13.74\%}$) 122_781_037 ($\textcolor{green}{-11.60\%}$) 1_054_639 ($\textcolor{green}{-13.70\%}$)
hashmap_rs 428_466 ($\textcolor{green}{-14.78\%}$) 10_178_230 ($\textcolor{green}{-7.34\%}$) 950_272 18_903 ($\textcolor{green}{-7.27\%}$) 57_565 ($\textcolor{green}{-8.49\%}$) 19_747 ($\textcolor{green}{-7.61\%}$)
imrc_hashmap_rs 435_292 ($\textcolor{green}{-15.31\%}$) 19_062_328 ($\textcolor{green}{-4.30\%}$) 1_572_864 29_764 ($\textcolor{green}{-5.57\%}$) 113_802 ($\textcolor{green}{-5.33\%}$) 36_791 ($\textcolor{green}{-2.20\%}$)
movm_rs 1_760_914 ($\textcolor{green}{-15.84\%}$) 999_676_261 ($\textcolor{green}{-1.73\%}$) 2_654_208 2_424_874 ($\textcolor{green}{-2.80\%}$) 6_357_705 ($\textcolor{green}{-1.84\%}$) 5_013_896 ($\textcolor{green}{-1.81\%}$)
movm_dynamic_rs 1_943_858 ($\textcolor{green}{-15.31\%}$) 485_763_587 ($\textcolor{green}{-2.12\%}$) 2_129_920 1_909_424 ($\textcolor{green}{-2.18\%}$) 2_642_175 ($\textcolor{green}{-2.49\%}$) 1_907_002 ($\textcolor{green}{-2.21\%}$)

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 242_539 ($\textcolor{green}{-16.79\%}$) 41_042 ($\textcolor{green}{-7.78\%}$) 18_026 ($\textcolor{green}{-9.51\%}$) 12_678 ($\textcolor{green}{-10.71\%}$) 14_924 ($\textcolor{green}{-11.16\%}$)
Rust 751_374 ($\textcolor{green}{-20.11\%}$) 500_487 ($\textcolor{green}{-7.56\%}$) 93_345 ($\textcolor{green}{-8.90\%}$) 114_984 ($\textcolor{green}{-8.37\%}$) 124_724 ($\textcolor{green}{-8.98\%}$)

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 200_814 ($\textcolor{green}{-17.91\%}$) 12_164 ($\textcolor{green}{-9.08\%}$) 22_455 ($\textcolor{green}{-9.01\%}$) 4_747 ($\textcolor{green}{-11.40\%}$)
Rust 801_533 ($\textcolor{green}{-20.30\%}$) 134_675 ($\textcolor{green}{-6.58\%}$) 348_766 ($\textcolor{green}{-7.22\%}$) 86_803 ($\textcolor{green}{-8.39\%}$)

Heartbeat

binary_size heartbeat
Motoko 135_630 ($\textcolor{green}{-13.51\%}$) 8_461 ($\textcolor{green}{-5.76\%}$)
Rust 28_624 ($\textcolor{green}{-19.61\%}$) 830 ($\textcolor{green}{-26.35\%}$)

Timer

binary_size setTimer cancelTimer
Motoko 142_158 ($\textcolor{green}{-13.50\%}$) 17_762 ($\textcolor{green}{-8.80\%}$) 1_706 ($\textcolor{green}{-10.54\%}$)
Rust 447_452 ($\textcolor{green}{-14.67\%}$) 49_589 ($\textcolor{green}{-10.09\%}$) 9_514 ($\textcolor{green}{-8.67\%}$)

Garbage Collection

Note Same as main branch, skipping.

Actor class

binary size put new bucket put existing bucket get
Map 289_202 ($\textcolor{green}{-12.66\%}$) 748_768 ($\textcolor{green}{-10.18\%}$) 5_609 ($\textcolor{green}{-9.36\%}$) 5_988 ($\textcolor{green}{-8.33\%}$)

Publisher & Subscriber

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 156_672 ($\textcolor{green}{-13.66\%}$) 143_547 ($\textcolor{green}{-13.84\%}$) 15_760 ($\textcolor{green}{-5.31\%}$) 8_489 ($\textcolor{green}{-7.17\%}$) 11_737 ($\textcolor{green}{-6.39\%}$) 3_665 ($\textcolor{green}{-8.40\%}$)
Rust 478_372 ($\textcolor{green}{-14.79\%}$) 527_123 ($\textcolor{green}{-24.33\%}$) 57_647 ($\textcolor{green}{-8.18\%}$) 38_523 ($\textcolor{green}{-9.27\%}$) 81_062 ($\textcolor{green}{-7.86\%}$) 45_691 ($\textcolor{green}{-7.98\%}$)
github-actions[bot] commented 1 year ago

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

💎 Takeaways

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • btree comes from Byron Becker's stable BTreeMap library.
  • zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.
  • The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 169_982 2_097_113_506 9_102_052 1_115_399 609_254_124 1_056_869
triemap 174_030 2_020_134_416 9_715_900 773_637 1_853_794 1_033_460
rbtree 171_127 1_797_995_532 8_902_160 670_401 1_623_975 859_340
splay 170_477 2_040_395_523 8_702_096 1_102_393 1_915_542 1_103_332
btree 198_636 1_875_401_612 7_556_172 813_525 1_718_273 862_047
zhenya_hashmap 165_325 1_642_423_605 9_301_800 647_832 1_447_024 652_030
btreemap_rs 438_979 112_676_543 1_638_400 59_465 133_080 60_509
hashmap_rs 428_466 49_363_168 1_835_008 19_572 58_237 20_805

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 156_998 688_335_838 1_400_024 338_619 711_943 340_032
heap_rs 406_219 4_975_528 819_200 48_902 20_578 49_090

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 169_982 419_486_900 1_820_844 1_113_679 122_781_037 1_054_639
hashmap_rs 428_466 10_178_230 950_272 18_903 57_565 19_747
imrc_hashmap_rs 435_292 19_062_328 1_572_864 29_764 113_802 36_791
movm_rs 1_760_914 999_676_261 2_654_208 2_424_874 6_357_705 5_013_896
movm_dynamic_rs 1_943_858 485_763_587 2_129_920 1_909_424 2_642_175 1_907_002

Sample Dapps

Measure the performance of some typical dapps:

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 242_539 41_042 18_026 12_678 14_924
Rust 751_374 500_487 93_345 114_984 124_724

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 200_814 12_164 22_455 4_747
Rust 801_533 134_675 348_766 86_803

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

Heartbeat

binary_size heartbeat
Motoko 135_630 8_461
Rust 28_624 830

Timer

binary_size setTimer cancelTimer
Motoko 142_158 17_762 1_706
Rust 447_452 49_589 9_514

Motoko Specific Benchmarks

Measure various features only available in Motoko.

Garbage Collection

generate 80k max mem batch_get 50 batch_put 50 batch_remove 50
default 247_113_104 15_539_816 50 50 50
copying 247_113_054 15_539_816 247_107_545 247_259_605 247_259_929
compacting 409_743_010 15_539_816 308_335_419 367_295_137 351_658_670
generational 625_110_580 15_540_080 56_690 1_100_091 622_657

Actor class

binary size put new bucket put existing bucket get
Map 289_202 748_768 5_609 5_988

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 156_672 143_547 15_760 8_489 11_737 3_665
Rust 478_372 527_123 57_647 38_523 81_062 45_691
kentosugama commented 1 year ago

Just updated the README.md