dfinity / canister-profiling

Collection of canister performance benchmarks
Apache License 2.0
21 stars 8 forks source link

more benchmarks #72

Closed chenyan-dfinity closed 1 year ago

chenyan-dfinity commented 1 year ago
github-actions[bot] commented 1 year ago

Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Warning Skip _out/collections/README.md, due to the number of tables mismatches from main branch.

Warning Skip main/_out/crypto/README.md. File not found.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 225_805 37_493 ($\textcolor{red}{0.06\%}$) 16_270 ($\textcolor{red}{0.26\%}$) 12_656 ($\textcolor{green}{-0.36\%}$) 14_127 ($\textcolor{green}{-0.20\%}$)
Rust 704_886 471_865 86_470 104_617 115_765

DIP721 NFT

Note Same as main branch, skipping.

Statistics

Heartbeat

binary_size heartbeat
Motoko 118_909 3_751 ($\textcolor{green}{-49.26\%}$)
Rust 23_699 474 ($\textcolor{green}{-40.08\%}$)

Timer

Note Same as main branch, skipping.

Statistics

Garbage Collection

generate 800k max mem batch_get 50 batch_put 50 batch_remove 50
default 1_012_258_524 59_396_776 ($\textcolor{red}{0.00\%}$) 50 50 50
copying 1_012_258_474 59_396_776 ($\textcolor{red}{0.00\%}$) 1_012_236_033 1_012_303_043 1_012_240_270
compacting 1_675_009_912 59_396_776 ($\textcolor{red}{0.00\%}$) 1_292_955_487 1_532_273_628 1_558_502_973
generational 2_517_025_054 59_405_240 ($\textcolor{red}{0.01\%}$) 977_578_942 1_052_786 967_410
incremental 32_320_741 1_136_153_832 ($\textcolor{red}{24570700.87\%}$) 290_257_785 292_951_006 292_977_552

Actor class

Note Same as main branch, skipping.

Statistics

github-actions[bot] commented 1 year ago

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

💎 Takeaways

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • btree comes from mops.one/stableheapbtreemap.
  • zhenya_hashmap comes from mops.one/map.
  • vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 133_828 6_960_077_358 61_987_732 287_469 5_515_887_135 308_972
triemap 135_316 11_431_084_368 74_216_052 222_768 547_650 538_998
rbtree 136_114 5_979_229_531 57_995_940 88_900 268_568 278_334
splay 131_868 11_568_250_397 53_995_876 551_921 581_659 810_215
btree 176_459 8_224_241_532 31_103_892 277_537 384_166 429_036
zhenya_hashmap 141_704 2_633_117_435 65_987_480 65_339 80_153 94_758
btreemap_rs 413_478 1_649_709_879 13_762_560 66_814 112_263 81_263
imrc_hashmap_rs 413_588 2_385_702_121 122_454_016 32_846 162_715 98_494
hashmap_rs 406_096 392_593_368 36_536_320 16_498 20_863 19_973

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50
heap 127_748 4_684_517_789 29_995_836 511_494 186_460 487_201
heap_rs 403_925 123_102_482 9_109_504 53_320 18_138 53_543

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500
buffer 135_462 2_082_618 65_508 73_087 671_512 127_587
vector 133_901 1_728_566 24_764 121_214 163_942 161_604
vec_rs 402_670 265_904 655_360 12_824 25_253 21_016

Cryptographic libraries

Measure different cryptographic libraries written in both Motoko and Rust.

SHA-2

binary_size SHA-256 SHA-512 account_id neuron_id
Motoko 170_112 264_156_344 235_099_564 35_144 23_250
Rust 490_873 82_512_107 56_526_045 42_397 41_597

Certified map

binary_size generate 10k max mem inc witness
Motoko 162_416 18_579_897_273 3_429_924 2_209_304 327_765
Rust 433_845 6_206_795_630 1_081_344 984_814 288_834

Sample Dapps

Measure the performance of some typical dapps:

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 225_805 37_493 16_270 12_656 14_127
Rust 704_886 471_865 86_470 104_617 115_765

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 183_882 12_181 22_319 4_710
Rust 766_710 125_034 324_482 77_116

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

Heartbeat

binary_size heartbeat
Motoko 118_909 3_751
Rust 23_699 474

Timer

binary_size setTimer cancelTimer
Motoko 125_168 15_208 1_679
Rust 434_848 43_540 7_683

Motoko Specific Benchmarks

Measure various features only available in Motoko.

Garbage Collection

generate 800k max mem batch_get 50 batch_put 50 batch_remove 50
default 1_012_258_524 59_396_776 50 50 50
copying 1_012_258_474 59_396_776 1_012_236_033 1_012_303_043 1_012_240_270
compacting 1_675_009_912 59_396_776 1_292_955_487 1_532_273_628 1_558_502_973
generational 2_517_025_054 59_405_240 977_578_942 1_052_786 967_410
incremental 32_320_741 1_136_153_832 290_257_785 292_951_006 292_977_552

Actor class

binary size put new bucket put existing bucket get
Map 254_076 638_613 4_449 4_909