chenyan-dfinity commented 1 year ago

to reveal the metadata init cost in upgrade

github-actions[bot] commented 1 year ago

Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Map

Note Same as main branch, skipping.

Priority queue

Note Same as main branch, skipping.

Growable array

Note Same as main branch, skipping.

Stable structures

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50	upgrade
btreemap_rs	494_261	70_231_886	2_555_904	57_208	86_708	79_740	100_477_350
btreemap_stable_rs	498_479 ($\textcolor{red}{0.13\%}$)	3_676_196_177	2_621_440	2_190_807	4_013_463	6_777_299	714_487
heap_rs	481_753	6_214_821	2_293_760	45_761	18_496	45_732	18_367_724
heap_stable_rs	469_772 ($\textcolor{red}{0.14\%}$)	240_377_401	458_752	2_038_566	209_047	2_023_426	714_446
vec_rs	480_829	2_866_842	2_293_760	12_986	14_081	13_678	16_575_110
vec_stable_rs	465_410 ($\textcolor{red}{0.14\%}$)	55_585_887	458_752	52_650	67_745	69_641	714_440

Statistics

binary_size: 0.14% [0.12%, 0.15%]
max_mem: no change
cycles: no change

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal	upgrade
Motoko	236_673	491_790 ($\textcolor{red}{0.00\%}$)	16_290 ($\textcolor{green}{-0.31\%}$)	12_672	14_136 ($\textcolor{red}{0.16\%}$)	122_439
Rust	806_537	541_266	86_052	107_287	117_056	1_686_510

DIP721 NFT

Note Same as main branch, skipping.

Statistics

binary_size: no change
max_mem: no change
cycles: -0.05% [-0.45%, 0.35%]

Overall Statistics
binary_size: 0.14% [0.12%, 0.15%]
max_mem: no change
cycles: -0.05% [-0.45%, 0.35%]

github-actions[bot] commented 1 year ago

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko. The _stable and _stable_rs suffix represents that the library directly writes the state to stable memory using Region in Motoko and ic-stable-stuctures in Rust.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

generate 1m. Insert 1m Nat64 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.
upgrade. Upgrade the canister with the same Wasm module. For non-stable benchmarks, the map state is persisted by serializing and deserializing states into stable memory. For stable benchmarks, the upgrade takes no cycles, as the state is already in the stable memory.

💎 Takeaways

The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an $O(10000 n\log n)$ algorithm hitting the limit, while an $O(n^2)$ algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.

Due to the instrumentation overhead and cycle limit, we cannot profile computations with very large collections.

The upgrade column uses Candid for serializing stable data. In Rust, you may get better cycle cost by using a different serialization format. Another slowdown in Rust is that ic-stable-structures tends to be slower than the region memory in Motoko.

Different library has different ways for persisting data during upgrades, there are mainly three categories:

Use stable variable directly in Motoko: zhenya_hashmap, btree, vector

Expose and serialize external state (share/unshare in Motoko, candid::Encode in Rust): rbtree, heap, btreemap_rs, hashmap_rs, heap_rs, vector_rs

Use pre/post-upgrade hooks to convert data into an array: hashmap, splay, triemap, buffer, imrc_hashmap_rs

The stable benchmarks are much more expensive than their non-stable counterpart, because the stable memory API is much more expensive. The benefit is that they get fast upgrade. The upgrade still needs to parse the metadata when initializing the upgraded Wasm module.

hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.

btree comes from mops.one/stableheapbtreemap.

zhenya_hashmap comes from mops.one/map.

vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.

hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.

imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

	binary_size	generate 1m	max mem	batch_get 50	batch_put 50	batch_remove 50	upgrade
hashmap	160_033	6_984_044_834	61_987_792	288_670	5_536_856_465	310_195	9_128_777_557
triemap	163_286	11_463_656_817	74_216_112	222_926	549_435	540_205	13_075_150_332
rbtree	157_961	5_979_230_865	57_996_000	88_905	268_573	278_339	5_771_873_746
splay	159_768	11_568_250_977	53_995_936	552_014	581_765	810_321	3_722_468_031
btree	187_709	8_224_242_624	31_103_952	277_542	384_171	429_041	2_517_935_226
zhenya_hashmap	160_321	2_201_622_488	22_773_040	48_627	61_839	70_872	2_695_441_915
btreemap_rs	494_261	1_654_113_949	27_590_656	66_889	112_603	81_249	2_401_229_430
imrc_hashmap_rs	500_199	2_407_082_660	244_973_568	32_962	163_913	98_591	5_209_975_418
hashmap_rs	487_986	403_296_624	73_138_176	17_350	21_647	20_615	957_579_445

Priority queue

	binary_size	heapify 1m	max mem	pop_min 50	put 50	pop_min 50	upgrade
heap	147_450	4_684_518_110	29_995_896	511_505	186_471	487_212	2_655_603_064
heap_rs	481_753	123_102_208	18_284_544	53_480	18_264	53_621	349_011_816

Growable array

	binary_size	generate 5k	max mem	batch_get 500	batch_put 500	batch_remove 500	upgrade
buffer	150_816	2_082_623	65_584	73_092	671_517	127_592	2_468_118
vector	152_363	1_588_260	24_520	105_191	149_932	148_094	3_837_918
vec_rs	480_829	265_643	1_376_256	12_986	25_331	21_215	2_854_587

Stable structures

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50	upgrade
btreemap_rs	494_261	70_231_886	2_555_904	57_208	86_708	79_740	100_477_350
btreemap_stable_rs	498_479	3_676_196_177	2_621_440	2_190_807	4_013_463	6_777_299	714_487
heap_rs	481_753	6_214_821	2_293_760	45_761	18_496	45_732	18_367_724
heap_stable_rs	469_772	240_377_401	458_752	2_038_566	209_047	2_023_426	714_446
vec_rs	480_829	2_866_842	2_293_760	12_986	14_081	13_678	16_575_110
vec_stable_rs	465_410	55_585_887	458_752	52_650	67_745	69_641	714_440

Sample Dapps

Measure the performance of some typical dapps:

Basic DAO, with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
DIP721 NFT

Note

The cost difference is mainly due to the Candid serialization cost.

Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.

We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.

For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal	upgrade
Motoko	236_673	491_790	16_290	12_672	14_136	122_439
Rust	806_537	541_266	86_052	107_287	117_056	1_686_510

DIP721 NFT

	binary_size	init	mint_token	transfer_token	upgrade
Motoko	194_938	466_439	22_357	4_729	65_612
Rust	820_893	210_081	324_368	81_020	1_860_416

dfinity / canister-profiling

touch stable structure #93

Map

Priority queue

Growable array

Stable structures

Statistics

Basic DAO

DIP721 NFT

Statistics

Overall Statistics

Collection libraries

💎 Takeaways

Map

Priority queue

Growable array

Stable structures

Sample Dapps

Basic DAO

DIP721 NFT