dfinity / canister-profiling

Collection of canister performance benchmarks
Apache License 2.0
21 stars 8 forks source link

rust: opt 1 #67

Closed chenyan-dfinity closed 1 year ago

github-actions[bot] commented 1 year ago

Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 152_580 1_195_632_150 9_102_052 545_645 365_569_669 520_876
triemap 156_424 1_338_995_779 9_715_900 459_710 1_193_026 686_569
rbtree 153_258 1_115_533_975 8_902_160 354_721 964_237 495_133
splay 152_693 1_323_550_652 8_702_096 719_103 1_214_198 717_146
btree 180_227 1_222_588_229 7_556_172 502_876 1_090_262 540_393
zhenya_hashmap 148_470 989_558_312 9_301_800 334_927 818_203 335_264
btreemap_rs 526_004 ($\textcolor{red}{6.40\%}$) 129_727_131 ($\textcolor{red}{15.06\%}$) 1_638_400 76_248 ($\textcolor{red}{27.95\%}$) 154_643 ($\textcolor{red}{15.19\%}$) 75_646 ($\textcolor{red}{24.72\%}$)
hashmap_rs 516_249 ($\textcolor{red}{6.39\%}$) 57_891_990 ($\textcolor{red}{16.70\%}$) 1_835_008 27_213 ($\textcolor{red}{38.14\%}$) 69_735 ($\textcolor{red}{16.98\%}$) 28_539 ($\textcolor{red}{36.33\%}$)

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 139_951 369_466_193 1_400_024 334_365 397_474
heap_rs 498_010 ($\textcolor{red}{8.56\%}$) 7_208_921 ($\textcolor{red}{44.89\%}$) 819_200 52_887 ($\textcolor{red}{8.34\%}$) 27_273 ($\textcolor{red}{31.83\%}$)

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 152_580 238_966_334 1_820_844 543_937 73_525_914 518_626
hashmap_rs 516_249 ($\textcolor{red}{6.39\%}$) 11_916_012 ($\textcolor{red}{16.55\%}$) 950_272 26_546 ($\textcolor{red}{39.49\%}$) 68_942 ($\textcolor{red}{16.97\%}$) 27_406 ($\textcolor{red}{37.89\%}$)
imrc_hashmap_rs 524_212 ($\textcolor{red}{7.08\%}$) 27_848_989 ($\textcolor{red}{6.50\%}$) 1_572_864 39_010 ($\textcolor{red}{30.64\%}$) 165_715 ($\textcolor{red}{7.64\%}$) 49_063 ($\textcolor{red}{32.19\%}$)
movm_rs 1_584_149 ($\textcolor{green}{-14.14\%}$) 1_255_894_005 ($\textcolor{red}{8.57\%}$) 2_654_208 3_010_534 ($\textcolor{red}{12.34\%}$) 8_080_808 ($\textcolor{red}{9.73\%}$) 6_407_407 ($\textcolor{red}{9.87\%}$)
movm_dynamic_rs 1_597_032 ($\textcolor{green}{-18.96\%}$) 573_543_536 ($\textcolor{red}{5.06\%}$) 2_129_920 2_325_226 ($\textcolor{red}{7.27\%}$) 3_157_234 ($\textcolor{red}{7.19\%}$) 2_308_078 ($\textcolor{red}{7.55\%}$)

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 225_805 37_493 ($\textcolor{red}{0.06\%}$) 16_270 ($\textcolor{red}{0.26\%}$) 12_654 ($\textcolor{green}{-0.38\%}$) 14_126 ($\textcolor{green}{-0.21\%}$)
Rust 757_835 ($\textcolor{green}{-2.72\%}$) 657_991 ($\textcolor{red}{32.03\%}$) 121_955 ($\textcolor{red}{30.83\%}$) 144_498 ($\textcolor{red}{26.44\%}$) 160_898 ($\textcolor{red}{29.04\%}$)

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 183_882 12_181 22_319 4_710
Rust 832_938 ($\textcolor{green}{-2.66\%}$) 170_821 ($\textcolor{red}{27.37\%}$) 442_716 ($\textcolor{red}{28.30\%}$) 118_903 ($\textcolor{red}{40.72\%}$)

Heartbeat

binary_size heartbeat
Motoko 118_909 7_392
Rust 30_514 ($\textcolor{red}{2.04\%}$) 637 ($\textcolor{green}{-30.46\%}$)

Timer

binary_size setTimer cancelTimer
Motoko 125_168 15_208 1_679
Rust 525_902 ($\textcolor{red}{5.56\%}$) 70_091 ($\textcolor{red}{37.50\%}$) 14_399 ($\textcolor{red}{47.44\%}$)

Publisher & Subscriber

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 139_886 126_827 14_632 8_451 10_530 3_662
Rust 519_924 ($\textcolor{green}{-2.76\%}$) 579_400 ($\textcolor{green}{-1.79\%}$) 76_157 ($\textcolor{red}{30.94\%}$) 51_221 ($\textcolor{red}{33.09\%}$) 100_740 ($\textcolor{red}{25.24\%}$) 60_192 ($\textcolor{red}{32.10\%}$)
github-actions[bot] commented 1 year ago

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

💎 Takeaways

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • btree comes from Byron Becker's stable BTreeMap library.
  • zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.
  • The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 152_580 1_195_632_150 9_102_052 545_645 365_569_669 520_876
triemap 156_424 1_338_995_779 9_715_900 459_710 1_193_026 686_569
rbtree 153_258 1_115_533_975 8_902_160 354_721 964_237 495_133
splay 152_693 1_323_550_652 8_702_096 719_103 1_214_198 717_146
btree 180_227 1_222_588_229 7_556_172 502_876 1_090_262 540_393
zhenya_hashmap 148_470 989_558_312 9_301_800 334_927 818_203 335_264
btreemap_rs 526_004 129_727_131 1_638_400 76_248 154_643 75_646
hashmap_rs 516_249 57_891_990 1_835_008 27_213 69_735 28_539

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 139_951 369_466_193 1_400_024 334_365 397_474 335_750
heap_rs 498_010 7_208_921 819_200 52_887 27_273 53_085

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 152_580 238_966_334 1_820_844 543_937 73_525_914 518_626
hashmap_rs 516_249 11_916_012 950_272 26_546 68_942 27_406
imrc_hashmap_rs 524_212 27_848_989 1_572_864 39_010 165_715 49_063
movm_rs 1_584_149 1_255_894_005 2_654_208 3_010_534 8_080_808 6_407_407
movm_dynamic_rs 1_597_032 573_543_536 2_129_920 2_325_226 3_157_234 2_308_078

Sample Dapps

Measure the performance of some typical dapps:

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 225_805 37_493 16_270 12_654 14_126
Rust 757_835 657_991 121_955 144_498 160_898

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 183_882 12_181 22_319 4_710
Rust 832_938 170_821 442_716 118_903

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

Heartbeat

binary_size heartbeat
Motoko 118_909 7_392
Rust 30_514 637

Timer

binary_size setTimer cancelTimer
Motoko 125_168 15_208 1_679
Rust 525_902 70_091 14_399

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 139_886 126_827 14_632 8_451 10_530 3_662
Rust 519_924 579_400 76_157 51_221 100_740 60_192