dfinity / canister-profiling

Collection of canister performance benchmarks
Apache License 2.0
21 stars 8 forks source link

test moc inlining #77

Closed chenyan-dfinity closed 11 months ago

chenyan-dfinity commented 11 months ago

base: old metering, without wasm-opt

github-actions[bot] commented 11 months ago

Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 167_316 ($\textcolor{red}{5.10\%}$) 9_230_374_182 ($\textcolor{red}{21.63\%}$) 61_987_732 382_826 ($\textcolor{red}{20.03\%}$) 7_246_862_420 ($\textcolor{red}{20.62\%}$) 411_340 ($\textcolor{red}{20.49\%}$)
triemap 171_390 ($\textcolor{red}{5.75\%}$) 16_198_090_366 ($\textcolor{red}{28.16\%}$) 74_216_052 309_052 ($\textcolor{red}{23.42\%}$) 782_396 ($\textcolor{red}{28.03\%}$) 769_026 ($\textcolor{red}{27.97\%}$)
rbtree 171_502 ($\textcolor{red}{5.60\%}$) 7_930_546_839 ($\textcolor{red}{28.72\%}$) 57_995_940 137_477 ($\textcolor{red}{18.31\%}$) 357_040 ($\textcolor{red}{28.78\%}$) 401_036 ($\textcolor{red}{32.72\%}$)
splay 166_269 ($\textcolor{red}{5.43\%}$) 15_696_985_410 ($\textcolor{red}{27.77\%}$) 53_995_876 752_855 ($\textcolor{red}{27.79\%}$) 790_594 ($\textcolor{red}{27.59\%}$) 1_106_735 ($\textcolor{red}{27.12\%}$)
btree 243_933 ($\textcolor{red}{13.73\%}$) 11_541_962_095 ($\textcolor{red}{26.28\%}$) 31_103_892 400_549 ($\textcolor{red}{27.48\%}$) 543_692 ($\textcolor{red}{26.54\%}$) 612_045 ($\textcolor{red}{27.29\%}$)
zhenya_hashmap 184_431 ($\textcolor{red}{9.49\%}$) 3_538_128_781 ($\textcolor{red}{25.37\%}$) 65_987_480 93_396 ($\textcolor{red}{27.84\%}$) 112_641 ($\textcolor{red}{25.56\%}$) 129_949 ($\textcolor{red}{19.80\%}$)
btreemap_rs 446_267 ($\textcolor{red}{0.02\%}$) 1_797_752_179 ($\textcolor{red}{9.79\%}$) 13_762_560 74_544 ($\textcolor{red}{13.42\%}$) 126_136 ($\textcolor{red}{12.52\%}$) 92_839 ($\textcolor{red}{13.30\%}$)
imrc_hashmap_rs 446_166 ($\textcolor{red}{0.01\%}$) 2_571_892_333 ($\textcolor{red}{7.66\%}$) 122_454_016 38_956 ($\textcolor{red}{17.34\%}$) 179_095 ($\textcolor{red}{9.64\%}$) 115_561 ($\textcolor{red}{18.60\%}$)
hashmap_rs 439_346 ($\textcolor{red}{0.01\%}$) 447_664_894 ($\textcolor{red}{8.11\%}$) 36_536_320 22_228 ($\textcolor{red}{29.80\%}$) 27_664 ($\textcolor{red}{24.45\%}$) 25_290 ($\textcolor{red}{24.44\%}$)

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50
heap 158_536 ($\textcolor{red}{3.98\%}$) 6_553_514_370 ($\textcolor{red}{26.01\%}$) 29_995_836 720_429 ($\textcolor{red}{25.49\%}$) 265_478 ($\textcolor{red}{27.04\%}$)
heap_rs 437_278 ($\textcolor{red}{0.02\%}$) 142_914_793 ($\textcolor{red}{14.02\%}$) 9_109_504 59_850 ($\textcolor{red}{11.13\%}$) 23_726 ($\textcolor{red}{28.08\%}$)

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500
buffer 171_769 ($\textcolor{red}{6.32\%}$) 2_839_066 ($\textcolor{red}{23.62\%}$) 65_508 108_809 ($\textcolor{red}{29.15\%}$) 896_597 ($\textcolor{red}{21.08\%}$) 186_809 ($\textcolor{red}{31.33\%}$)
vector 172_652 ($\textcolor{red}{7.35\%}$) 2_430_538 ($\textcolor{red}{28.59\%}$) 24_764 170_991 ($\textcolor{red}{24.46\%}$) 232_826 ($\textcolor{red}{29.81\%}$) 231_638 ($\textcolor{red}{27.68\%}$)
vec_rs 435_834 ($\textcolor{red}{0.01\%}$) 290_143 ($\textcolor{red}{8.96\%}$) 655_360 17_605 ($\textcolor{red}{33.16\%}$) 31_014 ($\textcolor{red}{20.92\%}$) 25_400 ($\textcolor{red}{22.22\%}$)

Statistics

SHA-2

binary_size SHA-256 SHA-512 account_id neuron_id
Motoko 230_070 ($\textcolor{red}{17.26\%}$) 326_935_518 ($\textcolor{red}{14.46\%}$) 298_093_814 ($\textcolor{red}{16.60\%}$) 41_503 ($\textcolor{red}{19.45\%}$) 29_780 ($\textcolor{red}{25.71\%}$)
Rust 528_234 ($\textcolor{red}{0.01\%}$) 82_789_387 ($\textcolor{red}{0.34\%}$) 56_794_263 ($\textcolor{red}{0.47\%}$) 50_651 ($\textcolor{red}{16.93\%}$) 53_532 ($\textcolor{red}{17.71\%}$)

Certified map

binary_size generate 10k max mem inc witness
Motoko 232_943 ($\textcolor{red}{13.19\%}$) 5_792_324_783 ($\textcolor{red}{14.70\%}$) 3_429_924 687_589 ($\textcolor{red}{14.91\%}$) 445_626 ($\textcolor{red}{23.36\%}$)
Rust 469_955 ($\textcolor{red}{0.02\%}$) 6_359_442_714 ($\textcolor{red}{2.03\%}$) 1_081_344 1_012_174 ($\textcolor{red}{2.36\%}$) 305_119 ($\textcolor{red}{4.88\%}$)

Statistics

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 301_252 ($\textcolor{red}{8.53\%}$) 50_473 ($\textcolor{red}{27.10\%}$) 24_519 ($\textcolor{red}{37.88\%}$) 20_380 ($\textcolor{red}{47.35\%}$) 21_890 ($\textcolor{red}{40.23\%}$)
Rust 763_017 ($\textcolor{red}{0.02\%}$) 552_075 ($\textcolor{red}{14.02\%}$) 105_203 ($\textcolor{red}{18.14\%}$) 128_753 ($\textcolor{red}{19.48\%}$) 139_539 ($\textcolor{red}{17.55\%}$)

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 244_819 ($\textcolor{red}{6.47\%}$) 19_098 ($\textcolor{red}{44.50\%}$) 31_854 ($\textcolor{red}{33.96\%}$) 9_565 ($\textcolor{red}{81.60\%}$)
Rust 828_238 ($\textcolor{red}{0.02\%}$) 146_257 ($\textcolor{red}{13.96\%}$) 380_260 ($\textcolor{red}{14.39\%}$) 93_763 ($\textcolor{red}{18.36\%}$)

Statistics

Heartbeat

binary_size heartbeat
Motoko 145_761 ($\textcolor{red}{2.57\%}$) 20_036 ($\textcolor{red}{157.63\%}$)
Rust 25_650 ($\textcolor{red}{0.02\%}$) 1_179 ($\textcolor{red}{151.92\%}$)

Timer

binary_size setTimer cancelTimer
Motoko 154_792 ($\textcolor{red}{3.67\%}$) 54_057 ($\textcolor{red}{228.20\%}$) 4_942 ($\textcolor{red}{167.57\%}$)
Rust 470_693 ($\textcolor{red}{0.02\%}$) 69_727 ($\textcolor{red}{56.66\%}$) 11_405 ($\textcolor{red}{47.87\%}$)

Statistics

Garbage Collection

generate 800k max mem batch_get 50 batch_put 50 batch_remove 50
default 1_338_231_405 ($\textcolor{red}{32.20\%}$) 59_396_776 118 ($\textcolor{red}{136.00\%}$) 118 ($\textcolor{red}{136.00\%}$) 118 ($\textcolor{red}{136.00\%}$)
copying 1_338_231_287 ($\textcolor{red}{32.20\%}$) 59_396_776 1_337_913_569 ($\textcolor{red}{32.17\%}$) 1_338_002_371 ($\textcolor{red}{32.17\%}$) 1_337_919_144 ($\textcolor{red}{32.17\%}$)
compacting 1_911_420_608 ($\textcolor{red}{14.11\%}$) 59_396_776 1_473_824_186 ($\textcolor{red}{13.99\%}$) 1_756_485_066 ($\textcolor{red}{14.63\%}$) 1_787_369_954 ($\textcolor{red}{14.69\%}$)
generational 2_891_818_643 ($\textcolor{red}{15.76\%}$) 59_405_240 1_141_865_993 ($\textcolor{red}{16.81\%}$) 1_217_376 ($\textcolor{red}{16.50\%}$) 1_117_840 ($\textcolor{red}{16.39\%}$)
incremental 33_436_719 ($\textcolor{red}{3.45\%}$) 1_136_155_048 333_734_166 ($\textcolor{red}{14.98\%}$) 336_829_512 ($\textcolor{red}{14.98\%}$) 336_860_690 ($\textcolor{red}{14.98\%}$)

Actor class

binary size put new bucket put existing bucket get
Map 311_868 ($\textcolor{red}{4.76\%}$) 818_044 ($\textcolor{red}{13.63\%}$) 16_950 ($\textcolor{red}{249.20\%}$) 17_406 ($\textcolor{red}{229.16\%}$)

Statistics

Publisher & Subscriber

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 172_768 ($\textcolor{red}{3.59\%}$) 155_860 ($\textcolor{red}{2.64\%}$) 29_874 ($\textcolor{red}{95.04\%}$) 12_492 ($\textcolor{red}{42.64\%}$) 23_937 ($\textcolor{red}{116.78\%}$) 6_816 ($\textcolor{red}{76.03\%}$)
Rust 511_870 ($\textcolor{red}{0.02\%}$) 565_407 ($\textcolor{red}{0.01\%}$) 71_728 ($\textcolor{red}{35.43\%}$) 44_318 ($\textcolor{red}{24.84\%}$) 95_767 ($\textcolor{red}{25.08\%}$) 53_941 ($\textcolor{red}{18.82\%}$)

Statistics

github-actions[bot] commented 11 months ago

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

💎 Takeaways

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • btree comes from mops.one/stableheapbtreemap.
  • zhenya_hashmap comes from mops.one/map.
  • vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 167_316 9_230_374_182 61_987_732 382_826 7_246_862_420 411_340
triemap 171_390 16_198_090_366 74_216_052 309_052 782_396 769_026
rbtree 171_502 7_930_546_839 57_995_940 137_477 357_040 401_036
splay 166_269 15_696_985_410 53_995_876 752_855 790_594 1_106_735
btree 243_933 11_541_962_095 31_103_892 400_549 543_692 612_045
zhenya_hashmap 184_431 3_538_128_781 65_987_480 93_396 112_641 129_949
btreemap_rs 446_267 1_797_752_179 13_762_560 74_544 126_136 92_839
imrc_hashmap_rs 446_166 2_571_892_333 122_454_016 38_956 179_095 115_561
hashmap_rs 439_346 447_664_894 36_536_320 22_228 27_664 25_290

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50
heap 158_536 6_553_514_370 29_995_836 720_429 265_478 686_570
heap_rs 437_278 142_914_793 9_109_504 59_850 23_726 60_072

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500
buffer 171_769 2_839_066 65_508 108_809 896_597 186_809
vector 172_652 2_430_538 24_764 170_991 232_826 231_638
vec_rs 435_834 290_143 655_360 17_605 31_014 25_400

Cryptographic libraries

Measure different cryptographic libraries written in both Motoko and Rust.

SHA-2

binary_size SHA-256 SHA-512 account_id neuron_id
Motoko 230_070 326_935_518 298_093_814 41_503 29_780
Rust 528_234 82_789_387 56_794_263 50_651 53_532

Certified map

binary_size generate 10k max mem inc witness
Motoko 232_943 5_792_324_783 3_429_924 687_589 445_626
Rust 469_955 6_359_442_714 1_081_344 1_012_174 305_119

Sample Dapps

Measure the performance of some typical dapps:

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 301_252 50_473 24_519 20_380 21_890
Rust 763_017 552_075 105_203 128_753 139_539

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 244_819 19_098 31_854 9_565
Rust 828_238 146_257 380_260 93_763

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

Heartbeat

binary_size heartbeat
Motoko 145_761 20_036
Rust 25_650 1_179

Timer

binary_size setTimer cancelTimer
Motoko 154_792 54_057 4_942
Rust 470_693 69_727 11_405

Motoko Specific Benchmarks

Measure various features only available in Motoko.

Garbage Collection

generate 800k max mem batch_get 50 batch_put 50 batch_remove 50
default 1_338_231_405 59_396_776 118 118 118
copying 1_338_231_287 59_396_776 1_337_913_569 1_338_002_371 1_337_919_144
compacting 1_911_420_608 59_396_776 1_473_824_186 1_756_485_066 1_787_369_954
generational 2_891_818_643 59_405_240 1_141_865_993 1_217_376 1_117_840
incremental 33_436_719 1_136_155_048 333_734_166 336_829_512 336_860_690

Actor class

binary size put new bucket put existing bucket get
Map 311_868 818_044 16_950 17_406

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 172_768 155_860 29_874 12_492 23_937 6_816
Rust 511_870 565_407 71_728 44_318 95_767 53_941