dfinity / canister-profiling

Collection of canister performance benchmarks
Apache License 2.0
21 stars 8 forks source link

[DO NOT MERGE] perf from moc artifact #51

Open chenyan-dfinity opened 1 year ago

chenyan-dfinity commented 1 year ago

Test https://github.com/dfinity/motoko/pull/3955

github-actions[bot] commented 1 year ago

Note Diffing the performance result against the published result from main branch. Unchanged benchmarks are omitted.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 196_227 2_387_017_574 9_102_052 1_293_015 689_295_883 1_224_704
triemap 201_786 2_286_509_386 9_715_900 891_925 2_111_535 1_187_671
rbtree 198_958 2_024_735_614 8_902_160 787_768 1_839_147 991_630
splay 197_868 2_305_505_782 8_702_096 1_258_328 2_175_053 1_259_845
btree 235_291 ($\textcolor{green}{-0.02\%}$) 2_116_952_097 ($\textcolor{green}{-0.25\%}$) 7_556_172 936_630 1_948_829 ($\textcolor{green}{-0.31\%}$) 991_671
zhenya_hashmap 190_491 1_855_281_618 9_301_800 745_902 1_651_310 752_198
btreemap_rs 514_775 115_994_744 1_638_400 59_433 137_855 61_795
hashmap_rs 502_772 53_333_947 1_835_008 21_070 63_601 22_484

Priority queue

Note Same as main branch, skipping.

MoVM

Note Same as main branch, skipping.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 291_432 ($\textcolor{green}{-0.02\%}$) 44_505 ($\textcolor{red}{0.13\%}$) 20_039 ($\textcolor{red}{0.58\%}$) 14_077 ($\textcolor{green}{-0.01\%}$) 16_739 ($\textcolor{red}{0.35\%}$)
Rust 940_461 541_441 102_463 125_485 137_030

DIP721 NFT

Note Same as main branch, skipping.

Heartbeat

binary_size heartbeat
Motoko 156_821 5_324 ($\textcolor{green}{-40.70\%}$)
Rust 35_608 1_127 ($\textcolor{red}{91.99\%}$)

Timer

Note Same as main branch, skipping.

github-actions[bot] commented 1 year ago

Note The flamegraph link only works after you merge. Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

💎 Takeaways

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • btree comes from Byron Becker's stable BTreeMap library.
  • zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.
  • The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 196_227 2_387_017_574 9_102_052 1_293_015 689_295_883 1_224_704
triemap 201_786 2_286_509_386 9_715_900 891_925 2_111_535 1_187_671
rbtree 198_958 2_024_735_614 8_902_160 787_768 1_839_147 991_630
splay 197_868 2_305_505_782 8_702_096 1_258_328 2_175_053 1_259_845
btree 235_291 2_116_952_097 7_556_172 936_630 1_948_829 991_671
zhenya_hashmap 190_491 1_855_281_618 9_301_800 745_902 1_651_310 752_198
btreemap_rs 514_775 115_994_744 1_638_400 59_433 137_855 61_795
hashmap_rs 502_772 53_333_947 1_835_008 21_070 63_601 22_484

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 181_726 793_253_862 1_400_024 385_321 822_756 386_887
heap_rs 473_458 5_041_433 819_200 53_243 22_092 53_420

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 196_227 477_464_161 1_820_844 1_291_042 138_897_096 1_222_118
hashmap_rs 502_772 10_984_247 950_272 20_385 62_907 21_374
imrc_hashmap_rs 513_980 19_919_391 1_572_864 31_519 120_207 37_618
movm_rs 2_092_441 1_017_324_475 2_654_208 2_494_635 6_477_172 5_106_080
movm_dynamic_rs 2_295_206 496_274_407 2_129_920 1_951_981 2_709_572 1_950_006

Sample Dapps

Measure the performance of some typical dapps:

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 291_432 44_505 20_039 14_077 16_739
Rust 940_461 541_441 102_463 125_485 137_030

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 244_622 13_379 24_678 5_358
Rust 1_005_637 144_162 375_896 94_757

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

Heartbeat

binary_size heartbeat
Motoko 156_821 5_324
Rust 35_608 1_127

Timer

binary_size setTimer cancelTimer
Motoko 164_347 19_476 1_907
Rust 524_361 55_152 10_417
ggreif commented 1 year ago

@chenyan-dfinity

against the published result from main branch

it is unclear if this refers to this repo's main branch or dfinity/motoko's master branch. If it is the former, I would have expected the second run (baseline 0.8.8) to show less improvement.

chenyan-dfinity commented 1 year ago

I just rerun the CI, it looks good now. It's against this repo's main branch. The main branch needs some time to update the report after the PR is merged. You probably trigger the CI too soon.