binary_size	heapify 1m	max mem	pop_min 50	put 50	pop_min 50.1	upgrade
heap	171_060 ($\textcolor{red}{0.21\%}$)	5_557_564_409	24_000_360	621_758	227_293	592_698	3_240_817_647 ($\textcolor{red}{0.00\%}$)
heap_rs	596_953	143_262_451	18_284_544	58_563	21_622	58_466	647_923_463

	binary_size	generate 5k	max mem	batch_get 500	batch_put 500	batch_remove 500	upgrade
buffer	178_117 ($\textcolor{red}{0.20\%}$)	2_601_290	65_652	95_575	803_545	173_575	3_146_728 ($\textcolor{red}{0.02\%}$)
vector	175_919 ($\textcolor{red}{0.21\%}$)	1_952_750	24_588	126_199	186_554	176_192	4_780_320 ($\textcolor{red}{0.01\%}$)
vec_rs	588_969	287_516	1_376_256	16_494	30_089	22_346	3_806_788

	binary_size	SHA-256	SHA-512	account_id	neuron_id
Motoko	199_257 ($\textcolor{red}{0.18\%}$)	282_867_517	262_958_028	34_369	25_335
Rust	596_836	82_782_948	56_788_520	42_522	41_228

	binary_size	generate 10k	max mem	inc	witness	upgrade
Motoko	248_058 ($\textcolor{red}{0.15\%}$)	365_606_356	342_396	397_640	267_761	22_396_932 ($\textcolor{red}{0.00\%}$)
Rust	640_537	489_666_578	1_310_720	660_965	220_622	450_827_450

	binary_size	init	transfer_token	submit_proposal	vote_proposal	upgrade
Motoko	278_879 ($\textcolor{red}{0.13\%}$)	513_370 ($\textcolor{red}{0.04\%}$)	23_402 ($\textcolor{red}{0.27\%}$)	19_181 ($\textcolor{green}{-0.31\%}$)	20_393 ($\textcolor{green}{-0.47\%}$)	162_161 ($\textcolor{red}{0.37\%}$)
Rust	902_362	516_184	92_673	118_753	113_669	1_499_571

	binary_size	init	mint_token	transfer_token	upgrade
Motoko	225_006 ($\textcolor{red}{0.16\%}$)	482_239 ($\textcolor{red}{0.04\%}$)	31_104	8_880	92_429 ($\textcolor{red}{0.65\%}$)
Rust	931_779	205_310	309_520	73_609	1_635_207

	binary_size	heartbeat
Motoko	142_246 ($\textcolor{red}{0.26\%}$)	27_494
Rust	26_684	1_201

	binary_size	setTimer	cancelTimer
Motoko	150_072 ($\textcolor{red}{0.24\%}$)	56_158	4_695
Rust	554_248	64_790	12_216

	binary size	put new bucket	put existing bucket	get
Map	421_426 ($\textcolor{red}{0.18\%}$)	758_787 ($\textcolor{red}{0.15\%}$)	16_349	16_917

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	165_797 ($\textcolor{red}{0.22\%}$)	150_117 ($\textcolor{red}{0.24\%}$)	32_863	12_200	27_064	6_622
Rust	593_655	629_046	59_348	39_106	74_039	43_504

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko. The _stable and _stable_rs suffix represents that the library directly writes the state to stable memory using Region in Motoko and ic-stable-stuctures in Rust.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

generate 1m. Insert 1m Nat64 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.
upgrade. Upgrade the canister with the same Wasm module. For non-stable benchmarks, the map state is persisted by serializing and deserializing states into stable memory. For stable benchmarks, the upgrade takes no cycles, as the state is already in the stable memory.

💎 Takeaways

The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an $O(10000 n\log n)$ algorithm hitting the limit, while an $O(n^2)$ algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.

Due to the instrumentation overhead and cycle limit, we cannot profile computations with very large collections.

The upgrade column uses Candid for serializing stable data. In Rust, you may get better cycle cost by using a different serialization format. Another slowdown in Rust is that ic-stable-structures tends to be slower than the region memory in Motoko.

Different library has different ways for persisting data during upgrades, there are mainly three categories:

Use stable variable directly in Motoko: zhenya_hashmap, btree, vector

Expose and serialize external state (share/unshare in Motoko, candid::Encode in Rust): rbtree, heap, btreemap_rs, hashmap_rs, heap_rs, vector_rs

Use pre/post-upgrade hooks to convert data into an array: hashmap, splay, triemap, buffer, imrc_hashmap_rs

The stable benchmarks are much more expensive than their non-stable counterpart, because the stable memory API is much more expensive. The benefit is that they get fast upgrade. The upgrade still needs to parse the metadata when initializing the upgraded Wasm module.

hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.

btree comes from mops.one/stableheapbtreemap.

zhenya_hashmap comes from mops.one/map.

vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.

hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.

imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

	binary_size	generate 1m	max mem	batch_get 50	batch_put 50	batch_remove 50	upgrade
hashmap	194_129	8_193_819_060	56_000_256	342_788	6_469_781_020	368_431	10_766_389_949
triemap	199_671	13_670_187_584	68_228_576	252_704	657_887	648_184	15_537_338_607
orderedmap	198_472	5_918_271_186	36_000_524	120_106	287_536	326_469	4_583_353_795
rbtree	190_190	6_993_676_959	52_000_464	116_417	317_299	330_296	6_988_883_792
splay	194_749	13_052_525_320	48_000_400	625_852	657_023	920_272	4_321_904_232
btree	234_430	10_220_059_976	25_108_416	357_581	485_463	539_509	2_906_462_612
zhenya_hashmap	192_961	2_361_649_032	16_777_504	58_299	66_594	79_776	3_084_236_540
btreemap_rs	611_851	1_809_789_841	27_590_656	74_098	124_626	85_214	3_208_130_200
imrc_hashmap_rs	613_202	2_634_915_707	244_908_032	35_894	198_252	96_520	6_383_840_797
hashmap_rs	601_477	438_103_157	73_138_176	20_788	25_678	23_645	1_545_701_419

Priority queue

	binary_size	heapify 1m	max mem	pop_min 50	put 50	pop_min 50	upgrade
heap	171_060	5_557_564_409	24_000_360	621_758	227_293	592_698	3_240_817_647
heap_rs	596_953	143_262_451	18_284_544	58_563	21_622	58_466	647_923_463

Growable array

	binary_size	generate 5k	max mem	batch_get 500	batch_put 500	batch_remove 500	upgrade
buffer	178_117	2_601_290	65_652	95_575	803_545	173_575	3_146_728
vector	175_919	1_952_750	24_588	126_199	186_554	176_192	4_780_320
vec_rs	588_969	287_516	1_376_256	16_494	30_089	22_346	3_806_788

Stable structures

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50	upgrade
btreemap_rs	611_851	77_021_026	2_555_904	63_656	96_504	84_265	139_792_280
btreemap_stable_rs	616_876	4_773_834_814	2_031_616	2_893_685	5_266_123	8_870_300	729_405
heap_rs	596_953	7_230_201	2_293_760	50_652	21_870	50_383	33_581_842
heap_stable_rs	576_040	283_742_492	458_752	2_526_262	246_537	2_506_863	729_375
vec_rs	588_969	3_077_883	2_293_760	16_494	17_489	16_734	31_302_411
vec_stable_rs	572_835	63_993_021	458_752	66_549	80_266	85_639	729_377

Environment

dfx 0.24.0

Motoko compiler 0.13.3 (source ff4il9yc-sfakbpl1-8z4dm2d6-ybdjncj7)

rustc 1.81.0 (eeb90cda1 2024-09-04)

ic-repl 0.7.6

ic-wasm 0.9.0
Cryptographic libraries

Measure different cryptographic libraries written in both Motoko and Rust.

SHA-2 benchmarks
- SHA-256/SHA-512. Compute the hash of a 1M Wasm binary.
- account_id. Compute the ledger account id from principal, based on SHA-224.
- neuron_id. Compute the NNS neuron id from principal, based on SHA-256.
Certified map. Merkle Tree for storing key-value pairs and generate witness according to the IC Interface Specification.
- generate 10k. Insert 10k 7-character word as both key and value into the certified map.
- max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
- inc. Increment a counter and insert the counter value into the map.
- witness. Generate the root hash and a witness for the counter.
- upgrade. Upgrade the canister with the same Wasm. In Motoko, we use stable variable. In Rust, we convert the tree to a vector before serialization.

SHA-2

	binary_size	SHA-256	SHA-512	account_id	neuron_id
Motoko	199_257	282_867_517	262_958_028	34_369	25_335
Rust	596_836	82_782_948	56_788_520	42_522	41_228

Certified map

	binary_size	generate 10k	max mem	inc	witness	upgrade
Motoko	248_058	365_606_356	342_396	397_640	267_761	22_396_932
Rust	640_537	489_666_578	1_310_720	660_965	220_622	450_827_450

Environment

dfx 0.24.0

Motoko compiler 0.13.3 (source ff4il9yc-sfakbpl1-8z4dm2d6-ybdjncj7)

rustc 1.81.0 (eeb90cda1 2024-09-04)

ic-repl 0.7.6

ic-wasm 0.9.0
Sample Dapps

Measure the performance of some typical dapps:

Basic DAO, with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
DIP721 NFT

Note

The cost difference is mainly due to the Candid serialization cost.

Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.

We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.

For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal	upgrade
Motoko	278_879	513_370	23_402	19_181	20_393	162_161
Rust	902_362	516_184	92_673	118_753	113_669	1_499_571

DIP721 NFT

	binary_size	init	mint_token	transfer_token	upgrade
Motoko	225_006	482_239	31_104	8_880	92_429
Rust	931_779	205_310	309_520	73_609	1_635_207

Environment

dfx 0.24.0

Motoko compiler 0.13.3 (source ff4il9yc-sfakbpl1-8z4dm2d6-ybdjncj7)

rustc 1.81.0 (eeb90cda1 2024-09-04)

ic-repl 0.7.6

ic-wasm 0.9.0
Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

setTimer measures both the setTimer(0) method and the execution of empty job.
It is not easy to reliably capture the above events in one flamegraph, as the implementation detail of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

	binary_size	heartbeat
Motoko	142_246	27_494
Rust	26_684	1_201

Timer

	binary_size	setTimer	cancelTimer
Motoko	150_072	56_158	4_695
Rust	554_248	64_790	12_216

Environment

dfx 0.24.0

Motoko compiler 0.13.3 (source ff4il9yc-sfakbpl1-8z4dm2d6-ybdjncj7)

rustc 1.81.0 (eeb90cda1 2024-09-04)

ic-repl 0.7.6

ic-wasm 0.9.0
Motoko Specific Benchmarks

Measure various features only available in Motoko.

Garbage Collection. Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_heap_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit. The dfx/ic-wasm optimizer is disabled for the garbage collection test cases due to how the optimizer affects function names, making profiling trickier.
- default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
- copying. Compile with --force-gc --copying-gc.
- compacting. Compile with --force-gc --compacting-gc.
- generational. Compile with --force-gc --generational-gc.
- incremental. Compile with --force-gc --incremental-gc.
Actor class. Measure the cost of spawning actor class, using the Actor classes example.

Garbage Collection

	generate 700k	max mem	batch_get 50	batch_put 50	batch_remove 50
default	1_074_136_336	47_793_792	119	119	119
copying	1_074_136_218	47_793_792	1_073_873_789	1_073_954_095	1_073_875_311
compacting	1_554_238_605	47_793_792	1_200_791_965	1_424_078_246	1_447_969_756
generational	2_326_734_591	47_802_256	899_105_682	1_214_812	1_107_099
incremental	29_505_471	976_097_724	469_026_873	496_491_319	1_282_778_770

Actor class

	binary size	put new bucket	put existing bucket	get
Map	421_426	758_787	16_349	16_917

Environment

dfx 0.24.0

Motoko compiler 0.13.3 (source ff4il9yc-sfakbpl1-8z4dm2d6-ybdjncj7)

rustc 1.81.0 (eeb90cda1 2024-09-04)

ic-repl 0.7.6

ic-wasm 0.9.0
Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	165_797	150_117	32_863	12_200	27_064	6_622
Rust	593_655	629_046	59_348	39_106	74_039	43_504

Environment

dfx 0.24.0

Motoko compiler 0.13.3 (source ff4il9yc-sfakbpl1-8z4dm2d6-ybdjncj7)

rustc 1.81.0 (eeb90cda1 2024-09-04)

ic-repl 0.7.6

ic-wasm 0.9.0

dfinity / canister-profiling

add orderedmap benchmark #118

Priority queue

Growable array

Stable structures

Statistics

SHA-2

Certified map

Statistics

Basic DAO

DIP721 NFT

Statistics

Heartbeat

Timer

Statistics

Garbage Collection

Actor class

Statistics

Publisher & Subscriber

Statistics

Overall Statistics

Collection libraries

💎 Takeaways

Map

Priority queue

Growable array

Stable structures

Environment

Cryptographic libraries

SHA-2

Certified map

Environment

Sample Dapps

Basic DAO

DIP721 NFT

Environment

Heartbeat / Timer

Heartbeat

Timer

Environment

Motoko Specific Benchmarks

Garbage Collection

Actor class

Environment

Publisher & Subscriber

Environment