Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Map

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	152_580	1_195_632_150	9_102_052	545_645	365_569_669	520_876
triemap	156_424	1_338_995_779	9_715_900	459_710	1_193_026	686_569
rbtree	153_258	1_115_533_975	8_902_160	354_721	964_237	495_133
splay	152_693	1_323_550_652	8_702_096	719_103	1_214_198	717_146
btree	180_227	1_222_588_229	7_556_172	502_876	1_090_262	540_393
zhenya_hashmap	148_470	989_558_312	9_301_800	334_927	818_203	335_264
btreemap_rs	526_004 ($\textcolor{red}{6.40\%}$)	129_727_131 ($\textcolor{red}{15.06\%}$)	1_638_400	76_248 ($\textcolor{red}{27.95\%}$)	154_643 ($\textcolor{red}{15.19\%}$)	75_646 ($\textcolor{red}{24.72\%}$)
hashmap_rs	516_249 ($\textcolor{red}{6.39\%}$)	57_891_990 ($\textcolor{red}{16.70\%}$)	1_835_008	27_213 ($\textcolor{red}{38.14\%}$)	69_735 ($\textcolor{red}{16.98\%}$)	28_539 ($\textcolor{red}{36.33\%}$)

Priority queue

	binary_size	heapify 50k	mem	pop_min 50	put 50
heap	139_951	369_466_193	1_400_024	334_365	397_474
heap_rs	498_010 ($\textcolor{red}{8.56\%}$)	7_208_921 ($\textcolor{red}{44.89\%}$)	819_200	52_887 ($\textcolor{red}{8.34\%}$)	27_273 ($\textcolor{red}{31.83\%}$)

MoVM

	binary_size	generate 10k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	152_580	238_966_334	1_820_844	543_937	73_525_914	518_626
hashmap_rs	516_249 ($\textcolor{red}{6.39\%}$)	11_916_012 ($\textcolor{red}{16.55\%}$)	950_272	26_546 ($\textcolor{red}{39.49\%}$)	68_942 ($\textcolor{red}{16.97\%}$)	27_406 ($\textcolor{red}{37.89\%}$)
imrc_hashmap_rs	524_212 ($\textcolor{red}{7.08\%}$)	27_848_989 ($\textcolor{red}{6.50\%}$)	1_572_864	39_010 ($\textcolor{red}{30.64\%}$)	165_715 ($\textcolor{red}{7.64\%}$)	49_063 ($\textcolor{red}{32.19\%}$)
movm_rs	1_584_149 ($\textcolor{green}{-14.14\%}$)	1_255_894_005 ($\textcolor{red}{8.57\%}$)	2_654_208	3_010_534 ($\textcolor{red}{12.34\%}$)	8_080_808 ($\textcolor{red}{9.73\%}$)	6_407_407 ($\textcolor{red}{9.87\%}$)
movm_dynamic_rs	1_597_032 ($\textcolor{green}{-18.96\%}$)	573_543_536 ($\textcolor{red}{5.06\%}$)	2_129_920	2_325_226 ($\textcolor{red}{7.27\%}$)	3_157_234 ($\textcolor{red}{7.19\%}$)	2_308_078 ($\textcolor{red}{7.55\%}$)

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	225_805	37_493 ($\textcolor{red}{0.06\%}$)	16_270 ($\textcolor{red}{0.26\%}$)	12_654 ($\textcolor{green}{-0.38\%}$)	14_126 ($\textcolor{green}{-0.21\%}$)
Rust	757_835 ($\textcolor{green}{-2.72\%}$)	657_991 ($\textcolor{red}{32.03\%}$)	121_955 ($\textcolor{red}{30.83\%}$)	144_498 ($\textcolor{red}{26.44\%}$)	160_898 ($\textcolor{red}{29.04\%}$)

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	183_882	12_181	22_319	4_710
Rust	832_938 ($\textcolor{green}{-2.66\%}$)	170_821 ($\textcolor{red}{27.37\%}$)	442_716 ($\textcolor{red}{28.30\%}$)	118_903 ($\textcolor{red}{40.72\%}$)

Heartbeat

	binary_size	heartbeat
Motoko	118_909	7_392
Rust	30_514 ($\textcolor{red}{2.04\%}$)	637 ($\textcolor{green}{-30.46\%}$)

Timer

	binary_size	setTimer	cancelTimer
Motoko	125_168	15_208	1_679
Rust	525_902 ($\textcolor{red}{5.56\%}$)	70_091 ($\textcolor{red}{37.50\%}$)	14_399 ($\textcolor{red}{47.44\%}$)

Publisher & Subscriber

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	139_886	126_827	14_632	8_451	10_530	3_662
Rust	519_924 ($\textcolor{green}{-2.76\%}$)	579_400 ($\textcolor{green}{-1.79\%}$)	76_157 ($\textcolor{red}{30.94\%}$)	51_221 ($\textcolor{red}{33.09\%}$)	100_740 ($\textcolor{red}{25.24\%}$)	60_192 ($\textcolor{red}{32.10\%}$)

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

generate 50k. Insert 50k Nat32 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_live_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an O(10000 nlogn) algorithm hitting the limit, while an O(n^2) algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.

Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.

hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.

hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.

btree comes from Byron Becker's stable BTreeMap library.

zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.

The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	152_580	1_195_632_150	9_102_052	545_645	365_569_669	520_876
triemap	156_424	1_338_995_779	9_715_900	459_710	1_193_026	686_569
rbtree	153_258	1_115_533_975	8_902_160	354_721	964_237	495_133
splay	152_693	1_323_550_652	8_702_096	719_103	1_214_198	717_146
btree	180_227	1_222_588_229	7_556_172	502_876	1_090_262	540_393
zhenya_hashmap	148_470	989_558_312	9_301_800	334_927	818_203	335_264
btreemap_rs	526_004	129_727_131	1_638_400	76_248	154_643	75_646
hashmap_rs	516_249	57_891_990	1_835_008	27_213	69_735	28_539

Priority queue

	binary_size	heapify 50k	mem	pop_min 50	put 50
heap	139_951	369_466_193	1_400_024	334_365	397_474	335_750
heap_rs	498_010	7_208_921	819_200	52_887	27_273	53_085

MoVM

	binary_size	generate 10k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	152_580	238_966_334	1_820_844	543_937	73_525_914	518_626
hashmap_rs	516_249	11_916_012	950_272	26_546	68_942	27_406
imrc_hashmap_rs	524_212	27_848_989	1_572_864	39_010	165_715	49_063
movm_rs	1_584_149	1_255_894_005	2_654_208	3_010_534	8_080_808	6_407_407
movm_dynamic_rs	1_597_032	573_543_536	2_129_920	2_325_226	3_157_234	2_308_078

Sample Dapps

Measure the performance of some typical dapps:

Basic DAO, with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
DIP721 NFT

Note

The cost difference is mainly due to the Candid serialization cost.

Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.

We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.

For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	225_805	37_493	16_270	12_654	14_126
Rust	757_835	657_991	121_955	144_498	160_898

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	183_882	12_181	22_319	4_710
Rust	832_938	170_821	442_716	118_903

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

setTimer measures both the setTimer(0) method and the execution of empty job.
It is not easy to reliably capture the above events in one flamegraph, as the implementation detail of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

	binary_size	heartbeat
Motoko	118_909	7_392
Rust	30_514	637

Timer

	binary_size	setTimer	cancelTimer
Motoko	125_168	15_208	1_679
Rust	525_902	70_091	14_399

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	139_886	126_827	14_632	8_451	10_530	3_662
Rust	519_924	579_400	76_157	51_221	100_740	60_192

dfinity / canister-profiling

rust: opt 1 #67

Map

Priority queue

MoVM

Basic DAO

DIP721 NFT

Heartbeat

Timer

Publisher & Subscriber

Collection libraries

💎 Takeaways

Map

Priority queue

MoVM

Sample Dapps

Basic DAO

DIP721 NFT

Heartbeat / Timer

Heartbeat

Timer

Publisher & Subscriber