Warning The flamegraph link only works after you merge.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

generate 50k. Insert 50k Nat32 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_live_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.

:gem: Takeaways

The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an O(10000 nlogn) algorithm hitting the limit, while an O(n^2) algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.

Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.

hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.

hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.

rbtree's remove method only performs logical removal of the elements. The removed elements still reside in memory, but not reachable from the map. A complete implementation of remove would cost a bit more than reported here.

The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	2_456_007_854	9_102_036	1_319_482	710_190_935	1_248_881
triemap	2_422_807_203	9_716_008	920_103	2_236_198	1_271_468
rbtree	2_322_981_525	10_102_164	844_569	2_120_544	998_470
splay	2_528_971_511	9_302_108	1_450_431	2_398_808	1_449_735
btreemap_rs	117_438_673	1_638_400	59_553	140_296	61_987
hashmap_rs	53_134_681	1_835_008	21_202	63_626	22_619

Priority queue

	heapify 50k	mem	pop_min 50	put 50
heap	814_736_870	1_400_024	483_010	862_326	485_101
heap_rs	5_042_128	819_200	53_302	22_310	53_569

MoVM

	generate 10k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	491_272_681	1_820_828	1_317_551	143_127_756	1_245_245
hashmap_rs	10_944_969	950_272	20_517	62_932	21_509
imrc_hashmap_rs	19_673_359	1_572_864	29_947	119_606	36_714
movm_rs	928_887_381	2_654_208	2_440_597	5_821_511	4_799_209
movm_dynamic_rs	485_461_977	2_129_920	1_872_262	2_654_951	1_831_888

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

	subscribe	publish
Motoko	caller (20_075) / callee (6_291)	caller (16_044) / callee (3_947)
Rust	caller (72_344) / callee (56_608)	caller (120_527) / callee (69_257)

Heartbeat

Measure the cost of empty heartbeat method.

	heartbeat
Motoko	6_161
Rust	587

Motoko Garbage Collection

Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_live_size after generate call.

default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
copying. Compile with --force-gc --copying-gc.
compacting. Compile with --force-gc --compacting-gc.
generational. Compile with --force-gc --generational-gc.

	generate 80k	max mem	batch_get 50	batch_put 50	batch_remove 50
default	3_905_834_559	15_539_984	926_657	2_258_879	1_299_907
copying	3_905_834_509	15_539_984	266_740_517	268_236_256	267_277_400
compacting	4_052_409_826	15_539_984	308_612_525	350_383_547	353_311_950
generational	4_276_930_061	15_540_260	987_302	3_659_583	2_369_738

Basic DAO

Measure the performance of a typical dapp Basic DAO, with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.

Note

The cost difference is mainly due to the Candid serialization cost.

Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.

We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.

For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

	init	transfer_token	submit_proposal	vote_proposal
Motoko	46_175	21_562	15_121	17_974
Rust	1_079_829	181_395	176_823	239_656

dfinity / canister-profiling

add readme for each table #23

Collection libraries