chenyan-dfinity commented 11 months ago

Bump dfx, ic-wasm, ic-repl and cargo
Fix #88

github-actions[bot] commented 11 months ago

Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Map

	binary_size	generate 1m	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	138_275	6_974_058_129	61_987_732	288_202	5_527_868_856	309_728
triemap	139_765	11_432_083_637	74_216_052	222_825	547_701	539_052
rbtree	140_562	5_979_229_508	57_995_940	88_905	268_573	278_352
splay	136_342	11_568_250_621	53_995_876	551_926	581_651	810_220
btree	181_449	8_224_241_444	31_103_892	277_542	384_171	429_041
zhenya_hashmap	153_793 ($\textcolor{red}{4.99\%}$)	2_201_621_425 ($\textcolor{green}{-16.42\%}$)	22_772_980 ($\textcolor{green}{-65.49\%}$)	48_627 ($\textcolor{green}{-25.64\%}$)	61_839 ($\textcolor{green}{-22.90\%}$)	70_872 ($\textcolor{green}{-25.26\%}$)
btreemap_rs	418_496 ($\textcolor{green}{-0.37\%}$)	1_654_114_123 ($\textcolor{green}{-0.00\%}$)	13_762_560	66_828 ($\textcolor{green}{-0.09\%}$)	112_500 ($\textcolor{green}{-0.06\%}$)	81_246 ($\textcolor{green}{-0.08\%}$)
imrc_hashmap_rs	418_054 ($\textcolor{green}{-0.40\%}$)	2_386_381_040 ($\textcolor{green}{-0.00\%}$)	122_454_016	32_841 ($\textcolor{green}{-0.19\%}$)	162_760 ($\textcolor{green}{-0.04\%}$)	98_464 ($\textcolor{green}{-0.06\%}$)
hashmap_rs	411_843 ($\textcolor{green}{-0.41\%}$)	402_296_785 ($\textcolor{green}{-0.00\%}$)	36_536_320	16_635 ($\textcolor{green}{-0.37\%}$)	21_539 ($\textcolor{green}{-0.29\%}$)	19_990 ($\textcolor{green}{-0.31\%}$)

Priority queue

	binary_size	heapify 1m	max mem	pop_min 50	put 50
heap	132_227	4_684_517_324	29_995_836	511_499	186_465
heap_rs	409_392 ($\textcolor{green}{-0.43\%}$)	123_102_351 ($\textcolor{green}{-0.00\%}$)	9_109_504	53_320 ($\textcolor{green}{-0.12\%}$)	18_140 ($\textcolor{green}{-0.34\%}$)

Growable array

	binary_size	generate 5k	max mem	batch_get 500	batch_put 500	batch_remove 500
buffer	139_908	2_082_623	65_508	73_092	671_517	127_592
vector	138_344	1_728_571	24_764	121_219	163_947	161_609
vec_rs	408_280 ($\textcolor{green}{-0.41\%}$)	265_791 ($\textcolor{green}{-0.02\%}$)	655_360	12_840 ($\textcolor{green}{-0.48\%}$)	25_269 ($\textcolor{green}{-0.24\%}$)	21_153 ($\textcolor{green}{-0.29\%}$)

Statistics

binary_size: 0.49% [-1.32%, 2.31%]
max_mem: -65.49%
cycles: -4.05% [-7.21%, -0.89%]

SHA-2

	binary_size	SHA-256	SHA-512	account_id	neuron_id
Motoko	172_890	247_480_401	228_033_044	30_017	20_760
Rust	497_739 ($\textcolor{green}{-0.11\%}$)	82_511_907 ($\textcolor{green}{-0.00\%}$)	56_525_950 ($\textcolor{green}{-0.00\%}$)	42_406 ($\textcolor{green}{-0.34\%}$)	44_341 ($\textcolor{green}{-0.52\%}$)

Certified map

	binary_size	generate 10k	max mem	inc	witness
Motoko	176_829	4_390_018_085	3_429_924	519_711	327_767
Rust	440_119 ($\textcolor{green}{-0.38\%}$)	6_202_162_996 ($\textcolor{green}{-0.00\%}$)	1_081_344	983_928 ($\textcolor{red}{0.00\%}$)	288_414 ($\textcolor{green}{-0.02\%}$)

Statistics

binary_size: -0.24% [-1.10%, 0.62%]
max_mem: no change
cycles: -0.13% [-0.28%, 0.03%]

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	230_182	37_638 ($\textcolor{red}{0.19\%}$)	16_286 ($\textcolor{red}{0.21\%}$)	12_712 ($\textcolor{red}{0.28\%}$)	14_184 ($\textcolor{red}{0.48\%}$)
Rust	713_090 ($\textcolor{green}{-0.74\%}$)	469_329 ($\textcolor{green}{-0.65\%}$)	86_401 ($\textcolor{green}{-0.44\%}$)	104_729 ($\textcolor{green}{-0.51\%}$)	115_792 ($\textcolor{green}{-0.38\%}$)

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	188_321	12_267	22_357	4_729
Rust	776_901 ($\textcolor{green}{-0.18\%}$)	124_454 ($\textcolor{green}{-0.67\%}$)	325_566 ($\textcolor{red}{0.17\%}$)	80_361 ($\textcolor{red}{3.69\%}$)

Statistics

binary_size: -0.46% [-2.22%, 1.31%]
max_mem: no change
cycles: 0.22% [-0.45%, 0.89%]

Heartbeat

	binary_size	heartbeat
Motoko	123_357	7_399
Rust	23_625	469 ($\textcolor{green}{-40.25\%}$)

Timer

	binary_size	setTimer	cancelTimer
Motoko	129_636	15_227	1_684
Rust	442_239 ($\textcolor{green}{-0.25\%}$)	43_295 ($\textcolor{green}{-0.28\%}$)	7_521 ($\textcolor{red}{0.32\%}$)

Statistics

binary_size: -0.25%
max_mem: no change
cycles: 0.02% [-1.88%, 1.92%]

Warning Skip table 0 ## Garbage Collection from _out/motoko/README.md, due to table shape mismatches from main branch.

Actor class

	binary size	put new bucket	put existing bucket	get
Map	261_335 ($\textcolor{green}{-0.10\%}$)	654_501 ($\textcolor{green}{-0.24\%}$)	4_459	4_919

Statistics

binary_size: no change
max_mem: no change
cycles: -0.17% [-0.59%, 0.25%]

Publisher & Subscriber

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	144_439	131_299	14_651	8_456	10_539	3_669
Rust	476_238 ($\textcolor{green}{-0.70\%}$)	525_832 ($\textcolor{green}{-0.72\%}$)	51_355 ($\textcolor{green}{-0.72\%}$)	34_433 ($\textcolor{green}{-0.47\%}$)	74_154 ($\textcolor{green}{-0.92\%}$)	44_068 ($\textcolor{green}{-0.66\%}$)

Statistics

binary_size: -0.71% [-0.78%, -0.64%]
max_mem: no change
cycles: -0.69% [-0.91%, -0.47%]

Overall Statistics
binary_size: -0.01% [-0.76%, 0.74%]
max_mem: -65.49%
cycles: -1.93% [-3.45%, -0.42%]

github-actions[bot] commented 11 months ago

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

generate 1m. Insert 1m Nat64 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an $O(10000 n\log n)$ algorithm hitting the limit, while an $O(n^2)$ algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.

Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.

hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.

btree comes from mops.one/stableheapbtreemap.

zhenya_hashmap comes from mops.one/map.

vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.

hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.

imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

	binary_size	generate 1m	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	138_275	6_974_058_129	61_987_732	288_202	5_527_868_856	309_728
triemap	139_765	11_432_083_637	74_216_052	222_825	547_701	539_052
rbtree	140_562	5_979_229_508	57_995_940	88_905	268_573	278_352
splay	136_342	11_568_250_621	53_995_876	551_926	581_651	810_220
btree	181_449	8_224_241_444	31_103_892	277_542	384_171	429_041
zhenya_hashmap	153_793	2_201_621_425	22_772_980	48_627	61_839	70_872
btreemap_rs	418_496	1_654_114_123	13_762_560	66_828	112_500	81_246
imrc_hashmap_rs	418_054	2_386_381_040	122_454_016	32_841	162_760	98_464
hashmap_rs	411_843	402_296_785	36_536_320	16_635	21_539	19_990

Priority queue

	binary_size	heapify 1m	max mem	pop_min 50	put 50
heap	132_227	4_684_517_324	29_995_836	511_499	186_465	487_206
heap_rs	409_392	123_102_351	9_109_504	53_320	18_140	53_545

Growable array

	binary_size	generate 5k	max mem	batch_get 500	batch_put 500	batch_remove 500
buffer	139_908	2_082_623	65_508	73_092	671_517	127_592
vector	138_344	1_728_571	24_764	121_219	163_947	161_609
vec_rs	408_280	265_791	655_360	12_840	25_269	21_153

Cryptographic libraries

Measure different cryptographic libraries written in both Motoko and Rust.

SHA-2 benchmarks
- SHA-256/SHA-512. Compute the hash of a 1M Wasm binary.
- account_id. Compute the ledger account id from principal, based on SHA-224.
- neuron_id. Compute the NNS neuron id from principal, based on SHA-256.
Certified map. Merkle Tree for storing key-value pairs and generate witness according to the IC Interface Specification.
- generate 10k. Insert 10k 7-character word as both key and value into the certified map.
- max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
- inc. Increment a counter and insert the counter value into the map.
- witness. Generate the root hash and a witness for the counter.

SHA-2

	binary_size	SHA-256	SHA-512	account_id	neuron_id
Motoko	172_890	247_480_401	228_033_044	30_017	20_760
Rust	497_739	82_511_907	56_525_950	42_406	44_341

Certified map

	binary_size	generate 10k	max mem	inc	witness
Motoko	176_829	4_390_018_085	3_429_924	519_711	327_767
Rust	440_119	6_202_162_996	1_081_344	983_928	288_414

Sample Dapps

Measure the performance of some typical dapps:

Basic DAO, with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
DIP721 NFT

Note

The cost difference is mainly due to the Candid serialization cost.

Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.

We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.

For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	230_182	37_638	16_286	12_712	14_184
Rust	713_090	469_329	86_401	104_729	115_792

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	188_321	12_267	22_357	4_729
Rust	776_901	124_454	325_566	80_361

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

setTimer measures both the setTimer(0) method and the execution of empty job.
It is not easy to reliably capture the above events in one flamegraph, as the implementation detail of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

	binary_size	heartbeat
Motoko	123_357	7_399
Rust	23_625	469

Timer

	binary_size	setTimer	cancelTimer
Motoko	129_636	15_227	1_684
Rust	442_239	43_295	7_521

Motoko Specific Benchmarks

Measure various features only available in Motoko.

Garbage Collection. Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_heap_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit. The dfx/ic-wasm optimizer is disabled for the garbage collection test cases due to how the optimizer affects function names, making profiling trickier.
- default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
- copying. Compile with --force-gc --copying-gc.
- compacting. Compile with --force-gc --compacting-gc.
- generational. Compile with --force-gc --generational-gc.
- incremental. Compile with --force-gc --incremental-gc.
Actor class. Measure the cost of spawning actor class, using the Actor classes example.

Garbage Collection

	generate 700k	max mem	batch_get 50	batch_put 50	batch_remove 50
default	886_040_881	51_991_272	50	50	50
copying	886_040_831	51_991_272	886_021_215	886_090_303	886_023_374
compacting	1_465_250_036	51_991_272	1_131_730_142	1_337_769_727	1_364_175_167
generational	2_184_686_556	51_999_736	855_706_700	1_058_853	948_937
incremental	28_518_084	985_883_928	290_276_117	292_998_383	292_988_797

Actor class

	binary size	put new bucket	put existing bucket	get
Map	261_335	654_501	4_459	4_919

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	144_439	131_299	14_651	8_456	10_539	3_669
Rust	476_238	525_832	51_355	34_433	74_154	44_068

dfinity / canister-profiling

Bump dependencies #89

Map

Priority queue

Growable array

Statistics

SHA-2

Certified map

Statistics

Basic DAO

DIP721 NFT

Statistics

Heartbeat

Timer

Statistics

Actor class

Statistics

Publisher & Subscriber

Statistics

Overall Statistics

Collection libraries

💎 Takeaways

Map

Priority queue

Growable array

Cryptographic libraries

SHA-2

Certified map

Sample Dapps

Basic DAO

DIP721 NFT

Heartbeat / Timer

Heartbeat

Timer

Motoko Specific Benchmarks

Garbage Collection

Actor class

Publisher & Subscriber