Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Warning Skip table 0 ## Map from _out/collections/README.md, due to table shape mismatches from main branch.

Warning Skip table 1 ## Priority queue from _out/collections/README.md, due to table shape mismatches from main branch.

Warning Skip table 2 ## Growable array from _out/collections/README.md, due to table shape mismatches from main branch.

Statistics

binary_size: no change
max_mem: no change
cycles: no change

SHA-2

	binary_size	SHA-256	SHA-512	account_id	neuron_id
Motoko	173_034 ($\textcolor{red}{0.08\%}$)	247_480_401	228_033_044	30_017	20_760
Rust	498_225 ($\textcolor{red}{0.10\%}$)	82_511_960 ($\textcolor{red}{0.00\%}$)	56_526_000 ($\textcolor{red}{0.00\%}$)	42_479 ($\textcolor{red}{0.17\%}$)	44_437 ($\textcolor{red}{0.22\%}$)

Warning Skip table 1 ## Certified map from _out/crypto/README.md, due to table shape mismatches from main branch.

Statistics

binary_size: 0.09% [0.05%, 0.14%]
max_mem: no change
cycles: 0.10% [-0.04%, 0.23%]

Warning Skip table 0 ## Basic DAO from _out/dapps/README.md, due to table shape mismatches from main branch.

Warning Skip table 1 ## DIP721 NFT from _out/dapps/README.md, due to table shape mismatches from main branch.

Statistics

binary_size: no change
max_mem: no change
cycles: no change

Heartbeat

	binary_size	heartbeat
Motoko	123_509 ($\textcolor{red}{0.12\%}$)	7_399 ($\textcolor{red}{96.89\%}$)
Rust	23_826 ($\textcolor{red}{0.85\%}$)	785

Timer

	binary_size	setTimer	cancelTimer
Motoko	129_780 ($\textcolor{red}{0.11\%}$)	15_227	1_684
Rust	441_467 ($\textcolor{green}{-0.17\%}$)	43_465 ($\textcolor{red}{0.39\%}$)	7_594 ($\textcolor{red}{0.97\%}$)

Statistics

binary_size: -0.03% [-0.93%, 0.87%]
max_mem: no change
cycles: 0.68% [-1.14%, 2.51%]

Garbage Collection

	generate 700k	max mem	batch_get 50	batch_put 50	batch_remove 50
default	886_041_847 ($\textcolor{red}{0.00\%}$)	51_991_332 ($\textcolor{red}{0.00\%}$)	50	50	50
copying	886_041_797 ($\textcolor{red}{0.00\%}$)	51_991_332 ($\textcolor{red}{0.00\%}$)	886_022_172 ($\textcolor{red}{0.00\%}$)	886_091_260 ($\textcolor{red}{0.00\%}$)	886_024_340 ($\textcolor{red}{0.00\%}$)
compacting	1_465_245_786 ($\textcolor{green}{-0.00\%}$)	51_991_332 ($\textcolor{red}{0.00\%}$)	1_131_731_112 ($\textcolor{red}{0.00\%}$)	1_337_770_678 ($\textcolor{red}{0.00\%}$)	1_364_176_157 ($\textcolor{red}{0.00\%}$)
generational	2_184_686_782 ($\textcolor{red}{0.00\%}$)	51_999_796 ($\textcolor{red}{0.00\%}$)	855_707_553 ($\textcolor{red}{0.00\%}$)	1_057_794 ($\textcolor{green}{-0.10\%}$)	947_862 ($\textcolor{green}{-0.11\%}$)
incremental	28_518_613 ($\textcolor{red}{0.00\%}$)	985_885_592 ($\textcolor{red}{0.00\%}$)	290_276_212 ($\textcolor{red}{0.00\%}$)	292_998_697 ($\textcolor{red}{0.00\%}$)	292_988_797

Actor class

	binary size	put new bucket	put existing bucket	get
Map	261_479 ($\textcolor{red}{0.06\%}$)	654_501	4_459	4_919

Statistics

binary_size: no change
max_mem: 0.00% [0.00%, 0.00%]
cycles: -0.01% [-0.03%, 0.01%]

Publisher & Subscriber

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	144_583 ($\textcolor{red}{0.10\%}$)	131_443 ($\textcolor{red}{0.11\%}$)	14_651	8_456	10_539	3_669
Rust	477_393 ($\textcolor{red}{0.24\%}$)	527_108 ($\textcolor{red}{0.24\%}$)	51_497 ($\textcolor{red}{0.28\%}$)	34_484 ($\textcolor{red}{0.15\%}$)	74_218 ($\textcolor{red}{0.09\%}$)	44_132 ($\textcolor{red}{0.15\%}$)

Statistics

binary_size: 0.17% [0.08%, 0.27%]
max_mem: no change
cycles: 0.16% [0.07%, 0.26%]

Overall Statistics
binary_size: 0.10% [0.02%, 0.19%]
max_mem: 0.00% [0.00%, 0.00%]
cycles: 0.08% [0.01%, 0.15%]

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

generate 1m. Insert 1m Nat64 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.
upgrade. Upgrade the canister with the same Wasm module. The map state is persisted by serializing and deserializing states into stable memory.

💎 Takeaways

The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an $O(10000 n\log n)$ algorithm hitting the limit, while an $O(n^2)$ algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.

Due to the instrumentation overhead and cycle limit, we cannot profile computations with very large collections.

The upgrade column uses Candid for serializing stable data. In Rust, you may get better cycle cost by using a different serialization format. Another slowdown in Rust is that ic-stable-structures tends to be slower than the region memory in Motoko.

Different library has different ways for persisting data during upgrades, there are mainly three categories:

Use stable variable directly in Motoko: zhenya_hashmap, btree, vector

Expose and serialize external state (share/unshare in Motoko, candid::Encode in Rust): rbtree, heap, btreemap_rs, hashmap_rs, heap_rs, vector_rs

Use pre/post-upgrade hooks to convert data into an array: hashmap, splay, triemap, buffer, imrc_hashmap_rs

hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.

btree comes from mops.one/stableheapbtreemap.

zhenya_hashmap comes from mops.one/map.

vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.

hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.

imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

	binary_size	generate 1m	max mem	batch_get 50	batch_put 50	batch_remove 50	upgrade
hashmap	160_033	6_984_044_834	61_987_792	288_670	5_536_856_465	310_195	9_128_777_557
triemap	163_286	11_463_656_817	74_216_112	222_926	549_435	540_205	13_075_150_332
rbtree	157_961	5_979_230_865	57_996_000	88_905	268_573	278_339	5_771_873_746
splay	159_768	11_568_250_977	53_995_936	552_014	581_765	810_321	3_722_468_031
btree	187_709	8_224_242_624	31_103_952	277_542	384_171	429_041	2_517_935_226
zhenya_hashmap	160_321	2_201_622_488	22_773_040	48_627	61_839	70_872	2_695_441_915
btreemap_rs	493_769	1_654_113_949	27_590_656	66_889	112_603	81_249	2_401_229_430
imrc_hashmap_rs	500_005	2_407_082_660	244_973_568	32_962	163_913	98_591	5_209_975_418
hashmap_rs	487_794	403_296_624	73_138_176	17_350	21_647	20_615	957_579_445

Priority queue

	binary_size	heapify 1m	max mem	pop_min 50	put 50	pop_min 50	upgrade
heap	147_450	4_684_518_110	29_995_896	511_505	186_471	487_212	2_655_603_064
heap_rs	479_573	123_102_208	18_284_544	53_480	18_264	53_621	349_011_816

Growable array

	binary_size	generate 5k	max mem	batch_get 500	batch_put 500	batch_remove 500	upgrade
buffer	150_816	2_082_623	65_584	73_092	671_517	127_592	2_468_118
vector	152_363	1_588_260	24_520	105_191	149_932	148_094	3_837_918
vec_rs	480_628	265_643	1_376_256	12_986	25_331	21_215	2_854_587

Cryptographic libraries

Measure different cryptographic libraries written in both Motoko and Rust.

SHA-2 benchmarks
- SHA-256/SHA-512. Compute the hash of a 1M Wasm binary.
- account_id. Compute the ledger account id from principal, based on SHA-224.
- neuron_id. Compute the NNS neuron id from principal, based on SHA-256.
Certified map. Merkle Tree for storing key-value pairs and generate witness according to the IC Interface Specification.
- generate 10k. Insert 10k 7-character word as both key and value into the certified map.
- max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
- inc. Increment a counter and insert the counter value into the map.
- witness. Generate the root hash and a witness for the counter.
- upgrade. Upgrade the canister with the same Wasm. In Motoko, we use stable variable. In Rust, we convert the tree to a vector before serialization.

SHA-2

	binary_size	SHA-256	SHA-512	account_id	neuron_id
Motoko	173_034	247_480_401	228_033_044	30_017	20_760
Rust	498_225	82_511_960	56_526_000	42_479	44_437

Certified map

	binary_size	generate 10k	max mem	inc	witness	upgrade
Motoko	206_295	4_390_018_572	3_429_984	519_711	327_767	225_144_790
Rust	521_776	6_202_432_827	2_228_224	983_997	288_528	5_811_201_292

Sample Dapps

Measure the performance of some typical dapps:

Basic DAO, with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
DIP721 NFT

Note

The cost difference is mainly due to the Candid serialization cost.

Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.

We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.

For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal	upgrade
Motoko	236_673	491_790	16_244	12_716	14_186	122_439
Rust	806_348	541_248	86_052	107_287	117_056	1_686_510

DIP721 NFT

	binary_size	init	mint_token	transfer_token	upgrade
Motoko	194_938	466_439	22_357	4_729	65_612
Rust	820_683	210_062	324_368	81_020	1_860_416

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

setTimer measures both the setTimer(0) method and the execution of empty job.
It is not easy to reliably capture the above events in one flamegraph, as the implementation detail of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

	binary_size	heartbeat
Motoko	123_509	7_399
Rust	23_826	785

Timer

	binary_size	setTimer	cancelTimer
Motoko	129_780	15_227	1_684
Rust	441_467	43_465	7_594

Motoko Specific Benchmarks

Measure various features only available in Motoko.

Garbage Collection. Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_heap_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit. The dfx/ic-wasm optimizer is disabled for the garbage collection test cases due to how the optimizer affects function names, making profiling trickier.
- default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
- copying. Compile with --force-gc --copying-gc.
- compacting. Compile with --force-gc --compacting-gc.
- generational. Compile with --force-gc --generational-gc.
- incremental. Compile with --force-gc --incremental-gc.
Actor class. Measure the cost of spawning actor class, using the Actor classes example.

Garbage Collection

	generate 700k	max mem	batch_get 50	batch_put 50	batch_remove 50
default	886_041_847	51_991_332	50	50	50
copying	886_041_797	51_991_332	886_022_172	886_091_260	886_024_340
compacting	1_465_245_786	51_991_332	1_131_731_112	1_337_770_678	1_364_176_157
generational	2_184_686_782	51_999_796	855_707_553	1_057_794	947_862
incremental	28_518_613	985_885_592	290_276_212	292_998_697	292_988_797

Actor class

	binary size	put new bucket	put existing bucket	get
Map	261_479	654_501	4_459	4_919

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	144_583	131_443	14_651	8_456	10_539	3_669
Rust	477_393	527_108	51_497	34_484	74_218	44_132

dfinity / canister-profiling

profiling upgrades #91

Statistics

SHA-2

Statistics

Statistics

Heartbeat

Timer

Statistics

Garbage Collection

Actor class

Statistics

Publisher & Subscriber

Statistics

Overall Statistics

Collection libraries

💎 Takeaways

Map

Priority queue

Growable array

Cryptographic libraries

SHA-2

Certified map

Sample Dapps

Basic DAO

DIP721 NFT

Heartbeat / Timer

Heartbeat

Timer

Motoko Specific Benchmarks

Garbage Collection

Actor class

Publisher & Subscriber