dfinity / canister-profiling

Collection of canister performance benchmarks
Apache License 2.0
21 stars 8 forks source link

bump ic-repl #102

Closed chenyan-dfinity closed 9 months ago

github-actions[bot] commented 9 months ago

Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
hashmap 160_407 ($\textcolor{red}{0.12\%}$) 8_550_261_900 ($\textcolor{red}{22.43\%}$) 61_987_852 349_808 ($\textcolor{red}{21.18\%}$) 6_641_222_848 ($\textcolor{red}{19.95\%}$) 377_790 ($\textcolor{red}{21.79\%}$) 11_273_695_870 ($\textcolor{red}{23.50\%}$)
triemap 163_606 ($\textcolor{red}{0.08\%}$) 15_268_684_428 ($\textcolor{red}{33.19\%}$) 74_216_172 301_412 ($\textcolor{red}{35.21\%}$) 737_508 ($\textcolor{red}{34.23\%}$) 722_627 ($\textcolor{red}{33.77\%}$) 17_302_961_961 ($\textcolor{red}{32.33\%}$)
rbtree 158_225 ($\textcolor{red}{0.05\%}$) 7_665_886_603 ($\textcolor{red}{28.21\%}$) 57_996_060 115_637 ($\textcolor{red}{30.07\%}$) 346_274 ($\textcolor{red}{28.93\%}$) 370_932 ($\textcolor{red}{33.26\%}$) 7_231_296_002 ($\textcolor{red}{25.28\%}$)
splay 160_055 ($\textcolor{red}{0.06\%}$) 15_388_999_493 ($\textcolor{red}{33.03\%}$) 53_995_996 738_550 ($\textcolor{red}{33.79\%}$) 777_813 ($\textcolor{red}{33.70\%}$) 1_077_553 ($\textcolor{red}{32.98\%}$) 4_832_227_514 ($\textcolor{red}{29.81\%}$)
btree 188_004 ($\textcolor{red}{0.06\%}$) 11_040_703_048 ($\textcolor{red}{34.25\%}$) 31_104_012 376_209 ($\textcolor{red}{35.55\%}$) 519_468 ($\textcolor{red}{35.22\%}$) 577_253 ($\textcolor{red}{34.54\%}$) 3_162_844_776 ($\textcolor{red}{25.61\%}$)
zhenya_hashmap 160_784 ($\textcolor{red}{0.17\%}$) 2_838_455_412 ($\textcolor{red}{28.93\%}$) 22_773_100 66_353 ($\textcolor{red}{36.45\%}$) 84_294 ($\textcolor{red}{36.31\%}$) 96_614 ($\textcolor{red}{36.32\%}$) 3_353_572_476 ($\textcolor{red}{24.42\%}$)
btreemap_rs 478_653 ($\textcolor{green}{-0.03\%}$) 1_797_168_137 ($\textcolor{red}{8.81\%}$) 27_590_656 75_573 ($\textcolor{red}{13.03\%}$) 125_437 ($\textcolor{red}{11.52\%}$) 86_323 ($\textcolor{red}{13.23\%}$) 2_941_065_080 ($\textcolor{red}{10.53\%}$)
imrc_hashmap_rs 482_783 ($\textcolor{green}{-0.03\%}$) 2_584_652_919 ($\textcolor{red}{8.01\%}$) 244_973_568 38_439 ($\textcolor{red}{17.32\%}$) 179_288 ($\textcolor{red}{9.83\%}$) 115_588 ($\textcolor{red}{17.47\%}$) 5_796_576_276 ($\textcolor{red}{11.65\%}$)
hashmap_rs 469_075 ($\textcolor{green}{-0.04\%}$) 433_765_793 ($\textcolor{red}{7.56\%}$) 73_138_176 21_856 ($\textcolor{red}{29.70\%}$) 26_958 ($\textcolor{red}{24.35\%}$) 25_223 ($\textcolor{red}{24.48\%}$) 1_293_479_144 ($\textcolor{red}{12.98\%}$)

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50 pop_min 50.1 upgrade
heap 147_908 ($\textcolor{red}{0.18\%}$) 6_393_606_585 ($\textcolor{red}{36.48\%}$) 29_995_956 701_835 ($\textcolor{red}{37.21\%}$) 258_841 ($\textcolor{red}{38.81\%}$) 668_731 ($\textcolor{red}{37.25\%}$) 3_337_783_210 ($\textcolor{red}{25.69\%}$)
heap_rs 462_827 ($\textcolor{green}{-0.03\%}$) 138_669_875 ($\textcolor{red}{14.04\%}$) 18_284_544 57_564 ($\textcolor{red}{11.43\%}$) 23_346 ($\textcolor{red}{27.96\%}$) 57_696 ($\textcolor{red}{11.38\%}$) 511_961_256 ($\textcolor{red}{16.16\%}$)

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500 upgrade
buffer 151_225 ($\textcolor{red}{0.15\%}$) 2_811_683 ($\textcolor{red}{35.01\%}$) 65_644 109_527 ($\textcolor{red}{49.85\%}$) 884_994 ($\textcolor{red}{31.79\%}$) 178_027 ($\textcolor{red}{39.53\%}$) 3_206_695 ($\textcolor{red}{29.58\%}$)
vector 152_845 ($\textcolor{red}{0.19\%}$) 2_228_708 ($\textcolor{red}{40.32\%}$) 24_580 146_173 ($\textcolor{red}{38.96\%}$) 212_398 ($\textcolor{red}{41.66\%}$) 203_951 ($\textcolor{red}{37.72\%}$) 4_810_168 ($\textcolor{red}{25.12\%}$)
vec_rs 460_616 ($\textcolor{green}{-0.03\%}$) 289_093 ($\textcolor{red}{8.81\%}$) 1_310_720 17_300 ($\textcolor{red}{32.93\%}$) 30_623 ($\textcolor{red}{20.74\%}$) 25_759 ($\textcolor{red}{21.24\%}$) 3_171_850 ($\textcolor{red}{15.60\%}$)

Stable structures

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
btreemap_rs 478_653 ($\textcolor{green}{-0.03\%}$) 76_392_041 ($\textcolor{red}{9.09\%}$) 2_555_904 65_131 ($\textcolor{red}{13.90\%}$) 97_327 ($\textcolor{red}{12.52\%}$) 85_332 ($\textcolor{red}{13.31\%}$) 126_467_483 ($\textcolor{red}{11.09\%}$)
btreemap_stable_rs 488_655 ($\textcolor{green}{-0.04\%}$) 5_255_681_694 ($\textcolor{red}{14.05\%}$) 2_031_616 3_167_281 ($\textcolor{red}{13.99\%}$) 5_784_578 ($\textcolor{red}{14.08\%}$) 9_888_420 ($\textcolor{red}{14.48\%}$) 731_129 ($\textcolor{red}{11.91\%}$)
heap_rs 462_827 ($\textcolor{green}{-0.03\%}$) 7_001_771 ($\textcolor{red}{14.04\%}$) 2_293_760 50_073 ($\textcolor{red}{12.87\%}$) 23_594 ($\textcolor{red}{27.69\%}$) 50_045 ($\textcolor{red}{12.85\%}$) 26_819_762 ($\textcolor{red}{15.86\%}$)
heap_stable_rs 450_750 ($\textcolor{green}{-0.04\%}$) 319_633_530 ($\textcolor{red}{14.86\%}$) 458_752 2_674_744 ($\textcolor{red}{14.80\%}$) 278_372 ($\textcolor{red}{15.87\%}$) 2_654_546 ($\textcolor{red}{14.80\%}$) 731_276 ($\textcolor{red}{11.91\%}$)
vec_rs 460_616 ($\textcolor{green}{-0.03\%}$) 3_079_437 ($\textcolor{red}{7.41\%}$) 2_228_224 17_300 ($\textcolor{red}{32.93\%}$) 18_473 ($\textcolor{red}{30.89\%}$) 18_000 ($\textcolor{red}{31.29\%}$) 24_772_373 ($\textcolor{red}{16.58\%}$)
vec_stable_rs 448_692 ($\textcolor{green}{-0.05\%}$) 75_145_339 ($\textcolor{red}{15.28\%}$) 458_752 70_326 ($\textcolor{red}{18.82\%}$) 91_522 ($\textcolor{red}{18.27\%}$) 93_718 ($\textcolor{red}{17.77\%}$) 731_289 ($\textcolor{red}{11.91\%}$)

Statistics

SHA-2

binary_size SHA-256 SHA-512 account_id neuron_id
Motoko 173_412 ($\textcolor{red}{0.11\%}$) 297_725_367 ($\textcolor{red}{20.30\%}$) 281_729_738 ($\textcolor{red}{23.55\%}$) 37_735 ($\textcolor{red}{25.71\%}$) 27_166 ($\textcolor{red}{30.86\%}$)
Rust 478_873 ($\textcolor{green}{-0.02\%}$) 82_788_111 ($\textcolor{red}{0.33\%}$) 56_793_155 ($\textcolor{red}{0.47\%}$) 49_455 ($\textcolor{red}{16.59\%}$) 52_124 ($\textcolor{red}{17.35\%}$)

Certified map

binary_size generate 10k max mem inc witness upgrade
Motoko 206_652 ($\textcolor{red}{0.08\%}$) 5_254_991_528 ($\textcolor{red}{19.70\%}$) 3_430_044 623_666 ($\textcolor{red}{20.00\%}$) 436_318 ($\textcolor{red}{33.12\%}$) 274_614_372 ($\textcolor{red}{21.97\%}$)
Rust 503_545 ($\textcolor{green}{-0.02\%}$) 6_309_529_790 ($\textcolor{red}{1.78\%}$) 2_228_224 1_004_189 ($\textcolor{red}{2.10\%}$) 301_445 ($\textcolor{red}{4.56\%}$) 5_926_705_911 ($\textcolor{red}{1.89\%}$)

Statistics

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal upgrade
Motoko 237_314 ($\textcolor{red}{0.19\%}$) 511_299 ($\textcolor{red}{2.76\%}$) 22_949 ($\textcolor{red}{41.21\%}$) 19_127 ($\textcolor{red}{50.89\%}$) 20_312 ($\textcolor{red}{43.89\%}$) 157_833 ($\textcolor{red}{22.53\%}$)
Rust 782_495 ($\textcolor{green}{-0.03\%}$) 613_885 ($\textcolor{red}{12.03\%}$) 101_922 ($\textcolor{red}{17.68\%}$) 126_043 ($\textcolor{red}{18.91\%}$) 137_819 ($\textcolor{red}{16.89\%}$) 1_825_018 ($\textcolor{red}{12.35\%}$)

DIP721 NFT

binary_size init mint_token transfer_token upgrade
Motoko 195_621 ($\textcolor{red}{0.25\%}$) 481_132 ($\textcolor{red}{1.88\%}$) 29_869 ($\textcolor{red}{33.60\%}$) 8_896 ($\textcolor{red}{88.12\%}$) 89_994 ($\textcolor{red}{25.69\%}$)
Rust 804_148 ($\textcolor{green}{-0.03\%}$) 240_775 ($\textcolor{red}{10.84\%}$) 369_710 ($\textcolor{red}{13.55\%}$) 91_916 ($\textcolor{red}{17.62\%}$) 2_014_186 ($\textcolor{red}{12.08\%}$)

Statistics

Heartbeat

binary_size heartbeat
Motoko 123_885 ($\textcolor{red}{0.15\%}$) 23_310 ($\textcolor{red}{215.04\%}$)
Rust 23_843 ($\textcolor{red}{0.02\%}$) 546 ($\textcolor{green}{-30.45\%}$)

Timer

binary_size setTimer cancelTimer
Motoko 130_153 ($\textcolor{red}{0.14\%}$) 52_177 ($\textcolor{red}{242.66\%}$) 4_653 ($\textcolor{red}{176.31\%}$)
Rust 423_499 ($\textcolor{green}{-0.03\%}$) 68_210 ($\textcolor{red}{56.91\%}$) 11_167 ($\textcolor{red}{47.54\%}$)

Statistics

Garbage Collection

generate 700k max mem batch_get 50 batch_put 50 batch_remove 50
default 1_171_382_327 ($\textcolor{red}{32.20\%}$) 51_991_392 119 ($\textcolor{red}{138.00\%}$) 119 ($\textcolor{red}{138.00\%}$) 119 ($\textcolor{red}{138.00\%}$)
copying 1_171_382_209 ($\textcolor{red}{32.20\%}$) 51_991_392 1_171_104_158 ($\textcolor{red}{32.18\%}$) 1_171_195_718 ($\textcolor{red}{32.18\%}$) 1_171_106_971 ($\textcolor{red}{32.18\%}$)
compacting 1_672_114_728 ($\textcolor{red}{14.12\%}$) 51_991_392 1_290_052_240 ($\textcolor{red}{13.99\%}$) 1_533_417_914 ($\textcolor{red}{14.62\%}$) 1_564_512_328 ($\textcolor{red}{14.69\%}$)
generational 2_529_073_463 ($\textcolor{red}{15.76\%}$) 51_999_856 999_515_223 ($\textcolor{red}{16.81\%}$) 1_232_308 ($\textcolor{red}{16.50\%}$) 1_103_421 ($\textcolor{red}{16.40\%}$)
incremental 29_503_121 ($\textcolor{red}{3.45\%}$) 985_885_652 333_756_008 ($\textcolor{red}{14.98\%}$) 336_886_802 ($\textcolor{red}{14.98\%}$) 336_875_628 ($\textcolor{red}{14.98\%}$)

Actor class

binary size put new bucket put existing bucket get
Map 261_945 ($\textcolor{red}{0.11\%}$) 715_767 ($\textcolor{red}{9.36\%}$) 16_296 ($\textcolor{red}{265.46\%}$) 16_803 ($\textcolor{red}{241.59\%}$)

Statistics

Publisher & Subscriber

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 145_023 ($\textcolor{red}{0.18\%}$) 131_856 ($\textcolor{red}{0.17\%}$) 28_786 ($\textcolor{red}{96.48\%}$) 11_976 ($\textcolor{red}{41.63\%}$) 23_098 ($\textcolor{red}{119.17\%}$) 6_439 ($\textcolor{red}{75.50\%}$)
Rust 458_294 ($\textcolor{green}{-0.03\%}$) 511_783 ($\textcolor{green}{-0.02\%}$) 69_854 ($\textcolor{red}{35.30\%}$) 42_859 ($\textcolor{red}{24.53\%}$) 92_892 ($\textcolor{red}{24.86\%}$) 52_112 ($\textcolor{red}{18.41\%}$)

Statistics

github-actions[bot] commented 9 months ago

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko. The _stable and _stable_rs suffix represents that the library directly writes the state to stable memory using Region in Motoko and ic-stable-stuctures in Rust.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

💎 Takeaways

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with very large collections.
  • The upgrade column uses Candid for serializing stable data. In Rust, you may get better cycle cost by using a different serialization format. Another slowdown in Rust is that ic-stable-structures tends to be slower than the region memory in Motoko.
  • Different library has different ways for persisting data during upgrades, there are mainly three categories:
    • Use stable variable directly in Motoko: zhenya_hashmap, btree, vector
    • Expose and serialize external state (share/unshare in Motoko, candid::Encode in Rust): rbtree, heap, btreemap_rs, hashmap_rs, heap_rs, vector_rs
    • Use pre/post-upgrade hooks to convert data into an array: hashmap, splay, triemap, buffer, imrc_hashmap_rs
  • The stable benchmarks are much more expensive than their non-stable counterpart, because the stable memory API is much more expensive. The benefit is that they get fast upgrade. The upgrade still needs to parse the metadata when initializing the upgraded Wasm module.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • btree comes from mops.one/stableheapbtreemap.
  • zhenya_hashmap comes from mops.one/map.
  • vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
hashmap 160_407 8_550_261_900 61_987_852 349_808 6_641_222_848 377_790 11_273_695_870
triemap 163_606 15_268_684_428 74_216_172 301_412 737_508 722_627 17_302_961_961
rbtree 158_225 7_665_886_603 57_996_060 115_637 346_274 370_932 7_231_296_002
splay 160_055 15_388_999_493 53_995_996 738_550 777_813 1_077_553 4_832_227_514
btree 188_004 11_040_703_048 31_104_012 376_209 519_468 577_253 3_162_844_776
zhenya_hashmap 160_784 2_838_455_412 22_773_100 66_353 84_294 96_614 3_353_572_476
btreemap_rs 478_653 1_797_168_137 27_590_656 75_573 125_437 86_323 2_941_065_080
imrc_hashmap_rs 482_783 2_584_652_919 244_973_568 38_439 179_288 115_588 5_796_576_276
hashmap_rs 469_075 433_765_793 73_138_176 21_856 26_958 25_223 1_293_479_144

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50 pop_min 50 upgrade
heap 147_908 6_393_606_585 29_995_956 701_835 258_841 668_731 3_337_783_210
heap_rs 462_827 138_669_875 18_284_544 57_564 23_346 57_696 511_961_256

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500 upgrade
buffer 151_225 2_811_683 65_644 109_527 884_994 178_027 3_206_695
vector 152_845 2_228_708 24_580 146_173 212_398 203_951 4_810_168
vec_rs 460_616 289_093 1_310_720 17_300 30_623 25_759 3_171_850

Stable structures

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
btreemap_rs 478_653 76_392_041 2_555_904 65_131 97_327 85_332 126_467_483
btreemap_stable_rs 488_655 5_255_681_694 2_031_616 3_167_281 5_784_578 9_888_420 731_129
heap_rs 462_827 7_001_771 2_293_760 50_073 23_594 50_045 26_819_762
heap_stable_rs 450_750 319_633_530 458_752 2_674_744 278_372 2_654_546 731_276
vec_rs 460_616 3_079_437 2_228_224 17_300 18_473 18_000 24_772_373
vec_stable_rs 448_692 75_145_339 458_752 70_326 91_522 93_718 731_289

Environment

  • dfx 0.15.2-beta.1
  • Motoko compiler 0.10.0 (source a3ywvw0a-p5a03qy6-vscbl9j8-qxszbxa6)
  • rustc 1.73.0 (cc66ad468 2023-10-03)
  • ic-repl 0.6.0
  • ic-wasm 0.7.0

    Cryptographic libraries

Measure different cryptographic libraries written in both Motoko and Rust.

SHA-2

binary_size SHA-256 SHA-512 account_id neuron_id
Motoko 173_412 297_725_367 281_729_738 37_735 27_166
Rust 478_873 82_788_111 56_793_155 49_455 52_124

Certified map

binary_size generate 10k max mem inc witness upgrade
Motoko 206_652 5_254_991_528 3_430_044 623_666 436_318 274_614_372
Rust 503_545 6_309_529_790 2_228_224 1_004_189 301_445 5_926_705_911

Environment

  • dfx 0.15.2-beta.1
  • Motoko compiler 0.10.0 (source a3ywvw0a-p5a03qy6-vscbl9j8-qxszbxa6)
  • rustc 1.73.0 (cc66ad468 2023-10-03)
  • ic-repl 0.6.0
  • ic-wasm 0.7.0

    Sample Dapps

Measure the performance of some typical dapps:

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal upgrade
Motoko 237_314 511_299 22_949 19_127 20_312 157_833
Rust 782_495 613_885 101_922 126_043 137_819 1_825_018

DIP721 NFT

binary_size init mint_token transfer_token upgrade
Motoko 195_621 481_132 29_869 8_896 89_994
Rust 804_148 240_775 369_710 91_916 2_014_186

Environment

  • dfx 0.15.2-beta.1
  • Motoko compiler 0.10.0 (source a3ywvw0a-p5a03qy6-vscbl9j8-qxszbxa6)
  • rustc 1.73.0 (cc66ad468 2023-10-03)
  • ic-repl 0.6.0
  • ic-wasm 0.7.0

    Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

Heartbeat

binary_size heartbeat
Motoko 123_885 23_310
Rust 23_843 546

Timer

binary_size setTimer cancelTimer
Motoko 130_153 52_177 4_653
Rust 423_499 68_210 11_167

Environment

  • dfx 0.15.2-beta.1
  • Motoko compiler 0.10.0 (source a3ywvw0a-p5a03qy6-vscbl9j8-qxszbxa6)
  • rustc 1.73.0 (cc66ad468 2023-10-03)
  • ic-repl 0.6.0
  • ic-wasm 0.7.0

    Motoko Specific Benchmarks

Measure various features only available in Motoko.

Garbage Collection

generate 700k max mem batch_get 50 batch_put 50 batch_remove 50
default 1_171_382_327 51_991_392 119 119 119
copying 1_171_382_209 51_991_392 1_171_104_158 1_171_195_718 1_171_106_971
compacting 1_672_114_728 51_991_392 1_290_052_240 1_533_417_914 1_564_512_328
generational 2_529_073_463 51_999_856 999_515_223 1_232_308 1_103_421
incremental 29_503_121 985_885_652 333_756_008 336_886_802 336_875_628

Actor class

binary size put new bucket put existing bucket get
Map 261_945 715_767 16_296 16_803

Environment

  • dfx 0.15.2-beta.1
  • Motoko compiler 0.10.0 (source a3ywvw0a-p5a03qy6-vscbl9j8-qxszbxa6)
  • rustc 1.73.0 (cc66ad468 2023-10-03)
  • ic-repl 0.6.0
  • ic-wasm 0.7.0

    Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 145_023 131_856 28_786 11_976 23_098 6_439
Rust 458_294 511_783 69_854 42_859 92_892 52_112

Environment

  • dfx 0.15.2-beta.1
  • Motoko compiler 0.10.0 (source a3ywvw0a-p5a03qy6-vscbl9j8-qxszbxa6)
  • rustc 1.73.0 (cc66ad468 2023-10-03)
  • ic-repl 0.6.0
  • ic-wasm 0.7.0