Warning The flamegraph link only works after you merge.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

generate 50k. Insert 50k Nat32 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_live_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an O(10000 nlogn) algorithm hitting the limit, while an O(n^2) algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.

Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.

hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.

hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.

rbtree's remove method only performs logical removal of the elements. The removed elements still reside in memory, but not reachable from the map. A complete implementation of remove would cost a bit more than reported here.

The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	203_891	2_456_108_447	9_102_052	1_319_582	710_191_522	1_248_981
triemap	208_030	2_422_807_277	9_716_008	920_103	2_236_198	1_271_468
rbtree	200_177	2_322_981_599	10_102_164	844_569	2_120_544	998_470
splay	205_437	2_528_971_585	9_302_108	1_450_431	2_398_808	1_449_735
btreemap_rs	526_971	123_797_849	1_638_400	59_755	140_301	62_121
hashmap_rs	515_644	53_134_200	1_835_008	21_395	63_730	22_812

Priority queue

	binary_size	heapify 50k	mem	pop_min 50	put 50
heap	188_543	814_736_944	1_400_024	482_960	862_276	485_051
heap_rs	485_570	5_041_733	819_200	53_595	22_315	53_772

MoVM

	binary_size	generate 10k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	203_891	491_293_274	1_820_844	1_317_651	143_128_343	1_245_345
hashmap_rs	515_644	10_944_500	950_272	20_710	63_036	21_702
imrc_hashmap_rs	526_570	19_861_874	1_572_864	31_854	120_242	37_953
movm_rs	2_089_325	1_131_930_057	2_654_208	2_831_694	7_116_074	5_565_647
movm_dynamic_rs	2_323_019	576_798_134	2_129_920	2_249_781	3_116_104	2_208_382

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

setTimer measures both the setTimer(0) method and the execution of empty job.
It is not easy to reliably capture the above events in one flamegraph, as the implementation detail of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

	binary_size	heartbeat
Motoko	156_504	12_264
Rust	35_604	1_127

Timer

	binary_size	setTimer	cancelTimer
Motoko	172_061	33_849	1_949
Rust	534_970	55_858	10_463

Motoko Garbage Collection

Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_live_size after generate call.

default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
copying. Compile with --force-gc --copying-gc.
compacting. Compile with --force-gc --compacting-gc.
generational. Compile with --force-gc --generational-gc.

	generate 80k	max mem	batch_get 50	batch_put 50	batch_remove 50
default	3_905_834_633	15_539_984	926_657	2_258_879	1_299_907
copying	3_905_834_583	15_539_984	266_740_591	268_236_330	267_277_474
compacting	4_052_409_880	15_539_984	308_612_579	350_383_601	353_312_004
generational	4_276_930_230	15_540_260	987_663	3_659_944	2_370_099

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

	pub_binary_size	sub_binary_size	subscribe	publish
Motoko	175_721	165_324	caller (20_084) / callee (6_291)	caller (16_044) / callee (3_947)
Rust	575_366	706_521	caller (63_608) / callee (43_332)	caller (89_685) / callee (50_643)

Sample Dapps

Measure the performance of some typical dapps:

Basic DAO, with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
DIP721 NFT

Note

The cost difference is mainly due to the Candid serialization cost.

Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.

We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.

For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	290_538	46_273	21_192	15_307	18_160
Rust	954_334	542_761	102_643	126_129	139_098

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	237_757	13_559	24_393	5_407
Rust	1_017_218	147_558	380_454	92_674

dfinity / canister-profiling

Add NFT benchmark #27

Collection libraries