dfinity / canister-profiling

Collection of canister performance benchmarks
Apache License 2.0
21 stars 8 forks source link

Allow CI to comment on forked PR #39

Closed chenyan-dfinity closed 1 year ago

github-actions[bot] commented 1 year ago

Download the artifacts for this pull request:

github-actions[bot] commented 1 year ago

Heartbeat

binary_size heartbeat
Motoko 147_123 8_284 ($\textcolor{green}{-30.61\%}$)
Rust 35_650 1_127

Timer

binary_size setTimer cancelTimer
Motoko 162_940 34_639 1_923
Rust 525_666 55_806 10_541

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 195_614 2_387_017_574 9_102_052 1_293_415 689_196_283 1_225_104
triemap 201_396 2_289_521_095 9_716_008 893_026 2_115_311 1_191_446
rbtree 199_580 2_117_694_679 10_102_184 823_600 1_917_436 1_081_941
splay 197_628 2_359_847_674 9_302_108 1_305_478 2_225_547 1_306_435
btree 235_285 2_169_842_886 8_157_968 1_006_805 2_000_086 1_089_548
zhenya_hashmap 189_128 1_855_331_619 9_301_800 746_302 1_651_710 752_598
btreemap_rs 516_125 123_800_095 1_638_400 59_721 140_267 62_087
hashmap_rs 504_191 53_234_034 1_835_008 21_361 63_796 22_778

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 180_400 796_409_317 1_400_024 420_789 834_415
heap_rs 475_167 5_041_620 819_200 53_561 22_281

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 195_614 477_464_161 1_820_844 1_291_442 138_877_496 1_222_518
hashmap_rs 504_191 10_964_340 950_272 20_676 63_102 21_668
imrc_hashmap_rs 516_559 19_861_761 1_572_864 31_820 120_208 37_919
movm_rs 2_035_228 1_098_781_054 2_654_208 2_743_966 6_943_650 5_416_733
movm_dynamic_rs 2_251_739 555_576_815 2_129_920 2_186_795 3_010_179 2_166_068

Publisher & Subscriber

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 171_754 156_908 19_642 9_145 15_536 4_001
Rust 564_006 696_272 63_407 43_190 89_378 50_543

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 291_066 44_646 ($\textcolor{red}{0.13\%}$) 19_956 ($\textcolor{green}{-0.62\%}$) 14_282 ($\textcolor{red}{0.01\%}$) 16_944 ($\textcolor{red}{0.37\%}$)
Rust 944_429 541_999 102_465 125_877 138_810

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 235_243 13_379 24_678 5_357
Rust 998_075 147_146 381_217 95_471

Motoko Garbage Collection

generate 80k max mem batch_get 50 batch_put 50 batch_remove 50
default 247_115_881 15_539_984 50 50 50
copying 247_115_831 15_539_984 247_110_319 247_262_382 247_262_501
compacting 409_365_425 15_539_984 308_339_012 348_775_445 352_663_118
generational 624_423_107 15_540_260 57_009 1_390_483 1_060_163
github-actions[bot] commented 1 year ago

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

Heartbeat

binary_size heartbeat
Motoko 147_123 8_284
Rust 35_650 1_127

Timer

binary_size setTimer cancelTimer
Motoko 162_940 34_639 1_923
Rust 525_666 55_806 10_541

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

💎 Takeaways

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • btree comes from Byron Becker's stable BTreeMap library.
  • zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.
  • The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 195_614 2_387_017_574 9_102_052 1_293_415 689_196_283 1_225_104
triemap 201_396 2_289_521_095 9_716_008 893_026 2_115_311 1_191_446
rbtree 199_580 2_117_694_679 10_102_184 823_600 1_917_436 1_081_941
splay 197_628 2_359_847_674 9_302_108 1_305_478 2_225_547 1_306_435
btree 235_285 2_169_842_886 8_157_968 1_006_805 2_000_086 1_089_548
zhenya_hashmap 189_128 1_855_331_619 9_301_800 746_302 1_651_710 752_598
btreemap_rs 516_125 123_800_095 1_638_400 59_721 140_267 62_087
hashmap_rs 504_191 53_234_034 1_835_008 21_361 63_796 22_778

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 180_400 796_409_317 1_400_024 420_789 834_415 422_542
heap_rs 475_167 5_041_620 819_200 53_561 22_281 53_738

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 195_614 477_464_161 1_820_844 1_291_442 138_877_496 1_222_518
hashmap_rs 504_191 10_964_340 950_272 20_676 63_102 21_668
imrc_hashmap_rs 516_559 19_861_761 1_572_864 31_820 120_208 37_919
movm_rs 2_035_228 1_098_781_054 2_654_208 2_743_966 6_943_650 5_416_733
movm_dynamic_rs 2_251_739 555_576_815 2_129_920 2_186_795 3_010_179 2_166_068

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 171_754 156_908 19_642 9_145 15_536 4_001
Rust 564_006 696_272 63_407 43_190 89_378 50_543

Sample Dapps

Measure the performance of some typical dapps:

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 291_066 44_646 19_956 14_282 16_944
Rust 944_429 541_999 102_465 125_877 138_810

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 235_243 13_379 24_678 5_357
Rust 998_075 147_146 381_217 95_471

Motoko Garbage Collection

Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_live_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit.

generate 80k max mem batch_get 50 batch_put 50 batch_remove 50
default 247_115_881 15_539_984 50 50 50
copying 247_115_831 15_539_984 247_110_319 247_262_382 247_262_501
compacting 409_365_425 15_539_984 308_339_012 348_775_445 352_663_118
generational 624_423_107 15_540_260 57_009 1_390_483 1_060_163
github-actions[bot] commented 1 year ago

Note Diffing the performance result against the published result from main branch

Heartbeat

binary_size heartbeat
Motoko 147_123 11_938
Rust 35_650 587 ($\textcolor{green}{-47.91\%}$)

Timer

binary_size setTimer cancelTimer
Motoko 162_940 34_639 1_923
Rust 525_666 55_806 10_541

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 195_614 2_387_017_574 9_102_052 1_293_415 689_196_283 1_225_104
triemap 201_396 2_289_521_095 9_716_008 893_026 2_115_311 1_191_446
rbtree 199_580 2_117_694_679 10_102_184 823_600 1_917_436 1_081_941
splay 197_628 2_359_847_674 9_302_108 1_305_478 2_225_547 1_306_435
btree 235_285 2_169_842_886 8_157_968 1_006_805 2_000_086 1_089_548
zhenya_hashmap 189_128 1_855_331_619 9_301_800 746_302 1_651_710 752_598
btreemap_rs 516_125 123_800_095 1_638_400 59_721 140_267 62_087
hashmap_rs 504_191 53_234_034 1_835_008 21_361 63_796 22_778

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 180_400 796_409_317 1_400_024 420_789 834_415
heap_rs 475_167 5_041_620 819_200 53_561 22_281

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 195_614 477_464_161 1_820_844 1_291_442 138_877_496 1_222_518
hashmap_rs 504_191 10_964_340 950_272 20_676 63_102 21_668
imrc_hashmap_rs 516_559 19_861_761 1_572_864 31_820 120_208 37_919
movm_rs 2_035_228 1_098_781_054 2_654_208 2_743_966 6_943_650 5_416_733
movm_dynamic_rs 2_251_739 555_576_815 2_129_920 2_186_795 3_010_179 2_166_068

Publisher & Subscriber

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 171_754 156_908 19_642 9_145 15_536 4_001
Rust 564_006 696_272 63_407 43_190 89_378 50_543

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 291_066 44_646 ($\textcolor{red}{0.13\%}$) 19_838 ($\textcolor{green}{-1.21\%}$) 14_282 ($\textcolor{red}{0.01\%}$) 16_944 ($\textcolor{red}{0.37\%}$)
Rust 944_429 541_999 102_465 125_877 138_810

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 235_243 13_379 24_678 5_357
Rust 998_075 147_146 381_217 95_471

Motoko Garbage Collection

generate 80k max mem batch_get 50 batch_put 50 batch_remove 50
default 247_115_881 15_539_984 50 50 50
copying 247_115_831 15_539_984 247_110_319 247_262_382 247_262_501
compacting 409_365_425 15_539_984 308_339_012 348_775_445 352_663_118
generational 624_423_107 15_540_260 57_009 1_390_483 1_060_163
github-actions[bot] commented 1 year ago

Warning The flamegraph link only works after you merge.

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

Heartbeat

binary_size heartbeat
Motoko 147_123 11_938
Rust 35_650 587

Timer

binary_size setTimer cancelTimer
Motoko 162_940 34_639 1_923
Rust 525_666 55_806 10_541

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

💎 Takeaways

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • btree comes from Byron Becker's stable BTreeMap library.
  • zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.
  • The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 195_614 2_387_017_574 9_102_052 1_293_415 689_196_283 1_225_104
triemap 201_396 2_289_521_095 9_716_008 893_026 2_115_311 1_191_446
rbtree 199_580 2_117_694_679 10_102_184 823_600 1_917_436 1_081_941
splay 197_628 2_359_847_674 9_302_108 1_305_478 2_225_547 1_306_435
btree 235_285 2_169_842_886 8_157_968 1_006_805 2_000_086 1_089_548
zhenya_hashmap 189_128 1_855_331_619 9_301_800 746_302 1_651_710 752_598
btreemap_rs 516_125 123_800_095 1_638_400 59_721 140_267 62_087
hashmap_rs 504_191 53_234_034 1_835_008 21_361 63_796 22_778

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 180_400 796_409_317 1_400_024 420_789 834_415 422_542
heap_rs 475_167 5_041_620 819_200 53_561 22_281 53_738

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 195_614 477_464_161 1_820_844 1_291_442 138_877_496 1_222_518
hashmap_rs 504_191 10_964_340 950_272 20_676 63_102 21_668
imrc_hashmap_rs 516_559 19_861_761 1_572_864 31_820 120_208 37_919
movm_rs 2_035_228 1_098_781_054 2_654_208 2_743_966 6_943_650 5_416_733
movm_dynamic_rs 2_251_739 555_576_815 2_129_920 2_186_795 3_010_179 2_166_068

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 171_754 156_908 19_642 9_145 15_536 4_001
Rust 564_006 696_272 63_407 43_190 89_378 50_543

Sample Dapps

Measure the performance of some typical dapps:

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 291_066 44_646 19_838 14_282 16_944
Rust 944_429 541_999 102_465 125_877 138_810

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 235_243 13_379 24_678 5_357
Rust 998_075 147_146 381_217 95_471

Motoko Garbage Collection

Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_live_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit.

generate 80k max mem batch_get 50 batch_put 50 batch_remove 50
default 247_115_881 15_539_984 50 50 50
copying 247_115_831 15_539_984 247_110_319 247_262_382 247_262_501
compacting 409_365_425 15_539_984 308_339_012 348_775_445 352_663_118
generational 624_423_107 15_540_260 57_009 1_390_483 1_060_163