dfinity / canister-profiling

Collection of canister performance benchmarks
Apache License 2.0
21 stars 8 forks source link

aggressive wasm-opt #82

Closed chenyan-dfinity closed 1 year ago

chenyan-dfinity commented 1 year ago

base: new metering, no wasm-opt

github-actions[bot] commented 1 year ago

Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 297_254 ($\textcolor{red}{86.93\%}$) 7_842_361_244 ($\textcolor{green}{-17.42\%}$) 61_987_732 321_307 ($\textcolor{green}{-18.21\%}$) 6_155_781_309 ($\textcolor{green}{-15.69\%}$) 345_252 ($\textcolor{green}{-18.43\%}$)
triemap 333_856 ($\textcolor{red}{106.48\%}$) 11_407_728_709 ($\textcolor{green}{-34.06\%}$) 74_216_052 195_746 ($\textcolor{green}{-43.43\%}$) 533_776 ($\textcolor{green}{-36.54\%}$) 524_332 ($\textcolor{green}{-36.45\%}$)
rbtree 296_811 ($\textcolor{red}{83.20\%}$) 6_513_814_073 ($\textcolor{green}{-23.03\%}$) 57_995_940 98_058 ($\textcolor{green}{-38.18\%}$) 289_073 ($\textcolor{green}{-24.93\%}$) 312_663 ($\textcolor{green}{-26.84\%}$)
splay 286_117 ($\textcolor{red}{81.79\%}$) 11_942_316_508 ($\textcolor{green}{-31.46\%}$) 53_995_876 563_736 ($\textcolor{green}{-32.98\%}$) 593_264 ($\textcolor{green}{-32.94\%}$) 821_312 ($\textcolor{green}{-33.49\%}$)
btree 612_964 ($\textcolor{red}{186.62\%}$) 8_499_914_744 ($\textcolor{green}{-35.95\%}$) 31_103_892 287_553 ($\textcolor{green}{-37.65\%}$) 394_692 ($\textcolor{green}{-37.28\%}$) 437_636 ($\textcolor{green}{-38.07\%}$)
zhenya_hashmap 327_958 ($\textcolor{red}{94.71\%}$) 2_966_591_069 ($\textcolor{green}{-23.47\%}$) 65_987_480 77_126 ($\textcolor{green}{-27.67\%}$) 89_303 ($\textcolor{green}{-31.98\%}$) 88_334 ($\textcolor{green}{-43.41\%}$)
btreemap_rs 614_219 ($\textcolor{red}{37.63\%}$) 1_797_482_744 ($\textcolor{green}{-0.01\%}$) 13_762_560 75_522 ($\textcolor{red}{1.31\%}$) 125_349 ($\textcolor{green}{-0.62\%}$) 91_276 ($\textcolor{green}{-1.68\%}$)
imrc_hashmap_rs 613_876 ($\textcolor{red}{37.59\%}$) 2_573_781_688 ($\textcolor{red}{0.07\%}$) 122_454_016 38_896 ($\textcolor{green}{-0.15\%}$) 178_602 ($\textcolor{green}{-0.28\%}$) 115_825 ($\textcolor{red}{0.23\%}$)
hashmap_rs 604_242 ($\textcolor{red}{37.53\%}$) 430_766_072 ($\textcolor{green}{-3.77\%}$) 36_536_320 21_623 ($\textcolor{green}{-2.72\%}$) 26_760 ($\textcolor{green}{-3.27\%}$) 24_943 ($\textcolor{green}{-1.37\%}$)

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50
heap 272_461 ($\textcolor{red}{78.86\%}$) 5_176_386_504 ($\textcolor{green}{-29.05\%}$) 29_995_836 557_538 ($\textcolor{green}{-31.44\%}$) 207_483 ($\textcolor{green}{-30.27\%}$)
heap_rs 602_644 ($\textcolor{red}{37.82\%}$) 140_170_047 ($\textcolor{green}{-1.92\%}$) 9_109_504 59_253 ($\textcolor{green}{-1.00\%}$) 23_240 ($\textcolor{green}{-2.05\%}$)

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500
buffer 364_175 ($\textcolor{red}{125.59\%}$) 2_313_561 ($\textcolor{green}{-29.10\%}$) 65_508 92_884 ($\textcolor{green}{-25.89\%}$) 714_218 ($\textcolor{green}{-31.50\%}$) 143_884 ($\textcolor{green}{-31.59\%}$)
vector 479_368 ($\textcolor{red}{198.16\%}$) 1_808_257 ($\textcolor{green}{-34.61\%}$) 24_764 143_018 ($\textcolor{green}{-27.60\%}$) 174_110 ($\textcolor{green}{-34.12\%}$) 167_657 ($\textcolor{green}{-37.26\%}$)
vec_rs 600_293 ($\textcolor{red}{37.73\%}$) 289_309 ($\textcolor{green}{-0.29\%}$) 655_360 17_099 ($\textcolor{green}{-2.87\%}$) 30_520 ($\textcolor{green}{-1.59\%}$) 25_655 ($\textcolor{red}{1.00\%}$)

Statistics

SHA-2

binary_size SHA-256 SHA-512 account_id neuron_id
Motoko 384_505 ($\textcolor{red}{96.00\%}$) 271_650_289 ($\textcolor{green}{-23.01\%}$) 261_095_203 ($\textcolor{green}{-23.00\%}$) 33_645 ($\textcolor{green}{-24.93\%}$) 24_286 ($\textcolor{green}{-23.91\%}$)
Rust 713_795 ($\textcolor{red}{35.13\%}$) 83_780_628 ($\textcolor{red}{1.20\%}$) 57_415_417 ($\textcolor{red}{1.09\%}$) 49_652 ($\textcolor{green}{-1.97\%}$) 52_343 ($\textcolor{green}{-2.22\%}$)

Certified map

binary_size generate 10k max mem inc witness
Motoko 930_337 ($\textcolor{red}{353.66\%}$) 4_641_230_534 ($\textcolor{green}{-25.77\%}$) 3_429_924 550_465 ($\textcolor{green}{-25.86\%}$) 365_185 ($\textcolor{green}{-27.89\%}$)
Rust 744_723 ($\textcolor{red}{58.47\%}$) 6_355_506_025 ($\textcolor{green}{-0.06\%}$) 1_081_344 1_011_262 ($\textcolor{green}{-0.09\%}$) 303_453 ($\textcolor{green}{-0.55\%}$)

Statistics

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 1_030_946 ($\textcolor{red}{271.66\%}$) 44_715 ($\textcolor{green}{-12.82\%}$) 20_929 ($\textcolor{green}{-17.00\%}$) 17_672 ($\textcolor{green}{-15.69\%}$) 18_567 ($\textcolor{green}{-17.63\%}$)
Rust 1_034_916 ($\textcolor{red}{35.63\%}$) 533_132 ($\textcolor{green}{-3.43\%}$) 101_918 ($\textcolor{green}{-3.12\%}$) 125_469 ($\textcolor{green}{-2.55\%}$) 135_796 ($\textcolor{green}{-2.68\%}$)

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 430_376 ($\textcolor{red}{87.10\%}$) 16_971 ($\textcolor{green}{-11.92\%}$) 28_261 ($\textcolor{green}{-12.50\%}$) 8_614 ($\textcolor{green}{-11.44\%}$)
Rust 1_157_204 ($\textcolor{red}{39.72\%}$) 142_282 ($\textcolor{green}{-2.72\%}$) 369_309 ($\textcolor{green}{-2.88\%}$) 91_478 ($\textcolor{green}{-2.44\%}$)

Statistics

Heartbeat

binary_size heartbeat
Motoko 235_377 ($\textcolor{red}{65.63\%}$) 19_034 ($\textcolor{green}{-20.92\%}$)
Rust 27_599 ($\textcolor{red}{7.60\%}$) 1_162 ($\textcolor{red}{111.66\%}$)

Timer

binary_size setTimer cancelTimer
Motoko 289_586 ($\textcolor{red}{94.03\%}$) 50_678 ($\textcolor{green}{-7.02\%}$) 4_491 ($\textcolor{green}{-9.78\%}$)
Rust 650_758 ($\textcolor{red}{38.26\%}$) 67_826 ($\textcolor{green}{-2.73\%}$) 11_037 ($\textcolor{green}{-3.23\%}$)

Statistics

Garbage Collection

Note Same as main branch, skipping.

Actor class

binary size put new bucket put existing bucket get
Map 499_549 ($\textcolor{red}{67.79\%}$) 775_680 ($\textcolor{green}{-1.01\%}$) 15_832 ($\textcolor{green}{-7.12\%}$) 16_330 ($\textcolor{green}{-6.85\%}$)

Statistics

Publisher & Subscriber

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 305_839 ($\textcolor{red}{83.38\%}$) 265_681 ($\textcolor{red}{74.91\%}$) 28_092 ($\textcolor{green}{-6.22\%}$) 11_466 ($\textcolor{green}{-8.68\%}$) 22_385 ($\textcolor{green}{-6.96\%}$) 6_218 ($\textcolor{green}{-9.52\%}$)
Rust 692_066 ($\textcolor{red}{35.20\%}$) 767_763 ($\textcolor{red}{35.79\%}$) 69_878 ($\textcolor{green}{-2.58\%}$) 42_967 ($\textcolor{green}{-3.05\%}$) 93_307 ($\textcolor{green}{-2.57\%}$) 52_511 ($\textcolor{green}{-2.65\%}$)

Statistics

github-actions[bot] commented 1 year ago

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

💎 Takeaways

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • btree comes from mops.one/stableheapbtreemap.
  • zhenya_hashmap comes from mops.one/map.
  • vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 297_254 7_842_361_244 61_987_732 321_307 6_155_781_309 345_252
triemap 333_856 11_407_728_709 74_216_052 195_746 533_776 524_332
rbtree 296_811 6_513_814_073 57_995_940 98_058 289_073 312_663
splay 286_117 11_942_316_508 53_995_876 563_736 593_264 821_312
btree 612_964 8_499_914_744 31_103_892 287_553 394_692 437_636
zhenya_hashmap 327_958 2_966_591_069 65_987_480 77_126 89_303 88_334
btreemap_rs 614_219 1_797_482_744 13_762_560 75_522 125_349 91_276
imrc_hashmap_rs 613_876 2_573_781_688 122_454_016 38_896 178_602 115_825
hashmap_rs 604_242 430_766_072 36_536_320 21_623 26_760 24_943

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50
heap 272_461 5_176_386_504 29_995_836 557_538 207_483 531_201
heap_rs 602_644 140_170_047 9_109_504 59_253 23_240 59_476

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500
buffer 364_175 2_313_561 65_508 92_884 714_218 143_884
vector 479_368 1_808_257 24_764 143_018 174_110 167_657
vec_rs 600_293 289_309 655_360 17_099 30_520 25_655

Cryptographic libraries

Measure different cryptographic libraries written in both Motoko and Rust.

SHA-2

binary_size SHA-256 SHA-512 account_id neuron_id
Motoko 384_505 271_650_289 261_095_203 33_645 24_286
Rust 713_795 83_780_628 57_415_417 49_652 52_343

Certified map

binary_size generate 10k max mem inc witness
Motoko 930_337 4_641_230_534 3_429_924 550_465 365_185
Rust 744_723 6_355_506_025 1_081_344 1_011_262 303_453

Sample Dapps

Measure the performance of some typical dapps:

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 1_030_946 44_715 20_929 17_672 18_567
Rust 1_034_916 533_132 101_918 125_469 135_796

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 430_376 16_971 28_261 8_614
Rust 1_157_204 142_282 369_309 91_478

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

Heartbeat

binary_size heartbeat
Motoko 235_377 19_034
Rust 27_599 1_162

Timer

binary_size setTimer cancelTimer
Motoko 289_586 50_678 4_491
Rust 650_758 67_826 11_037

Motoko Specific Benchmarks

Measure various features only available in Motoko.

Garbage Collection

generate 800k max mem batch_get 50 batch_put 50 batch_remove 50
default 1_338_231_405 59_396_776 118 118 118
copying 1_338_231_287 59_396_776 1_337_913_569 1_338_002_371 1_337_919_144
compacting 1_911_420_608 59_396_776 1_473_824_186 1_756_485_066 1_787_369_954
generational 2_891_818_643 59_405_240 1_141_865_993 1_217_376 1_117_840
incremental 33_436_719 1_136_155_048 333_734_166 336_829_512 336_860_690

Actor class

binary size put new bucket put existing bucket get
Map 499_549 775_680 15_832 16_330

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 305_839 265_681 28_092 11_466 22_385 6_218
Rust 692_066 767_763 69_878 42_967 93_307 52_511