chenyan-dfinity commented 1 year ago

Use max_heap for Motoko memory, as max_live only reports live from last GC run.
Growable array
SHA2
Certified map (TODO bump sha2 dependency)

github-actions[bot] commented 1 year ago

Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Warning Skip _out/collections/README.md, due to the number of tables mismatches from main branch.

Warning Skip main/_out/crypto/README.md. File not found.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	225_805	37_493 ($\textcolor{red}{0.06\%}$)	16_270 ($\textcolor{red}{0.26\%}$)	12_656 ($\textcolor{green}{-0.36\%}$)	14_127 ($\textcolor{green}{-0.20\%}$)
Rust	704_886	471_865	86_470	104_617	115_765

DIP721 NFT

Note Same as main branch, skipping.

Statistics

binary_size: no change
max_mem: no change
cycles: -0.06% [-0.39%, 0.26%]

Heartbeat

	binary_size	heartbeat
Motoko	118_909	3_751 ($\textcolor{green}{-49.26\%}$)
Rust	23_699	474 ($\textcolor{green}{-40.08\%}$)

Timer

Note Same as main branch, skipping.

Statistics

binary_size: no change
max_mem: no change
cycles: no change

Garbage Collection

	generate 800k	max mem	batch_get 50	batch_put 50	batch_remove 50
default	1_012_258_524	59_396_776 ($\textcolor{red}{0.00\%}$)	50	50	50
copying	1_012_258_474	59_396_776 ($\textcolor{red}{0.00\%}$)	1_012_236_033	1_012_303_043	1_012_240_270
compacting	1_675_009_912	59_396_776 ($\textcolor{red}{0.00\%}$)	1_292_955_487	1_532_273_628	1_558_502_973
generational	2_517_025_054	59_405_240 ($\textcolor{red}{0.01\%}$)	977_578_942	1_052_786	967_410
incremental	32_320_741	1_136_153_832 ($\textcolor{red}{24570700.87\%}$)	290_257_785	292_951_006	292_977_552

Actor class

Note Same as main branch, skipping.

Statistics

binary_size: no change
max_mem: 4914140.18% [-5562053.73%, 15390334.09%]
cycles: no change

Overall Statistics
binary_size: no change
max_mem: 4914140.18% [-5562053.73%, 15390334.09%]
cycles: -0.06% [-0.39%, 0.26%]

github-actions[bot] commented 1 year ago

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

generate 1m. Insert 1m Nat64 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an $O(10000 n\log n)$ algorithm hitting the limit, while an $O(n^2)$ algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.

Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.

hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.

btree comes from mops.one/stableheapbtreemap.

zhenya_hashmap comes from mops.one/map.

vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.

hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.

imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

	binary_size	generate 1m	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	133_828	6_960_077_358	61_987_732	287_469	5_515_887_135	308_972
triemap	135_316	11_431_084_368	74_216_052	222_768	547_650	538_998
rbtree	136_114	5_979_229_531	57_995_940	88_900	268_568	278_334
splay	131_868	11_568_250_397	53_995_876	551_921	581_659	810_215
btree	176_459	8_224_241_532	31_103_892	277_537	384_166	429_036
zhenya_hashmap	141_704	2_633_117_435	65_987_480	65_339	80_153	94_758
btreemap_rs	413_478	1_649_709_879	13_762_560	66_814	112_263	81_263
imrc_hashmap_rs	413_588	2_385_702_121	122_454_016	32_846	162_715	98_494
hashmap_rs	406_096	392_593_368	36_536_320	16_498	20_863	19_973

Priority queue

	binary_size	heapify 1m	max mem	pop_min 50	put 50
heap	127_748	4_684_517_789	29_995_836	511_494	186_460	487_201
heap_rs	403_925	123_102_482	9_109_504	53_320	18_138	53_543

Growable array

	binary_size	generate 5k	max mem	batch_get 500	batch_put 500	batch_remove 500
buffer	135_462	2_082_618	65_508	73_087	671_512	127_587
vector	133_901	1_728_566	24_764	121_214	163_942	161_604
vec_rs	402_670	265_904	655_360	12_824	25_253	21_016

Cryptographic libraries

Measure different cryptographic libraries written in both Motoko and Rust.

SHA-2 benchmarks
- SHA-256/SHA-512. Compute the hash of a 1M Wasm binary.
- account_id. Compute the ledger account id from principal, based on SHA-224.
- neuron_id. Compute the NNS neuron id from principal, based on SHA-256.
Certified map. Merkle Tree for storing key-value pairs and generate witness according to the IC Interface Specification.
- generate 10k. Insert 10k 7-character word as both key and value into the certified map.
- max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
- inc. Increment a counter and insert the counter value into the map.
- witness. Generate the root hash and a witness for the counter.

SHA-2

	binary_size	SHA-256	SHA-512	account_id	neuron_id
Motoko	170_112	264_156_344	235_099_564	35_144	23_250
Rust	490_873	82_512_107	56_526_045	42_397	41_597

Certified map

	binary_size	generate 10k	max mem	inc	witness
Motoko	162_416	18_579_897_273	3_429_924	2_209_304	327_765
Rust	433_845	6_206_795_630	1_081_344	984_814	288_834

Sample Dapps

Measure the performance of some typical dapps:

Basic DAO, with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
DIP721 NFT

Note

The cost difference is mainly due to the Candid serialization cost.

Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.

We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.

For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	225_805	37_493	16_270	12_656	14_127
Rust	704_886	471_865	86_470	104_617	115_765

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	183_882	12_181	22_319	4_710
Rust	766_710	125_034	324_482	77_116

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

setTimer measures both the setTimer(0) method and the execution of empty job.
It is not easy to reliably capture the above events in one flamegraph, as the implementation detail of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

	binary_size	heartbeat
Motoko	118_909	3_751
Rust	23_699	474

Timer

	binary_size	setTimer	cancelTimer
Motoko	125_168	15_208	1_679
Rust	434_848	43_540	7_683

Motoko Specific Benchmarks

Measure various features only available in Motoko.

Garbage Collection. Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_heap_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit. The dfx/ic-wasm optimizer is disabled for the garbage collection test cases due to how the optimizer affects function names, making profiling trickier.
- default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
- copying. Compile with --force-gc --copying-gc.
- compacting. Compile with --force-gc --compacting-gc.
- generational. Compile with --force-gc --generational-gc.
- incremental. Compile with --force-gc --incremental-gc.
Actor class. Measure the cost of spawning actor class, using the Actor classes example.

Garbage Collection

	generate 800k	max mem	batch_get 50	batch_put 50	batch_remove 50
default	1_012_258_524	59_396_776	50	50	50
copying	1_012_258_474	59_396_776	1_012_236_033	1_012_303_043	1_012_240_270
compacting	1_675_009_912	59_396_776	1_292_955_487	1_532_273_628	1_558_502_973
generational	2_517_025_054	59_405_240	977_578_942	1_052_786	967_410
incremental	32_320_741	1_136_153_832	290_257_785	292_951_006	292_977_552

Actor class

	binary size	put new bucket	put existing bucket	get
Map	254_076	638_613	4_449	4_909

dfinity / canister-profiling

more benchmarks #72

Basic DAO

DIP721 NFT

Statistics

Heartbeat

Timer

Statistics

Garbage Collection

Actor class

Statistics

Overall Statistics

Collection libraries

💎 Takeaways

Map

Priority queue

Growable array

Cryptographic libraries

SHA-2

Certified map

Sample Dapps

Basic DAO

DIP721 NFT

Heartbeat / Timer

Heartbeat

Timer

Motoko Specific Benchmarks

Garbage Collection

Actor class