Closed chenyan-dfinity closed 1 year ago
binary_size | heartbeat | |
---|---|---|
Motoko | 147_123 | 8_284 ($\textcolor{green}{-30.61\%}$) |
Rust | 35_650 | 1_127 |
binary_size | setTimer | cancelTimer | |
---|---|---|---|
Motoko | 162_940 | 34_639 | 1_923 |
Rust | 525_666 | 55_806 | 10_541 |
binary_size | generate 50k | max mem | batch_get 50 | batch_put 50 | batch_remove 50 | |
---|---|---|---|---|---|---|
hashmap | 195_614 | 2_387_017_574 | 9_102_052 | 1_293_415 | 689_196_283 | 1_225_104 |
triemap | 201_396 | 2_289_521_095 | 9_716_008 | 893_026 | 2_115_311 | 1_191_446 |
rbtree | 199_580 | 2_117_694_679 | 10_102_184 | 823_600 | 1_917_436 | 1_081_941 |
splay | 197_628 | 2_359_847_674 | 9_302_108 | 1_305_478 | 2_225_547 | 1_306_435 |
btree | 235_285 | 2_169_842_886 | 8_157_968 | 1_006_805 | 2_000_086 | 1_089_548 |
zhenya_hashmap | 189_128 | 1_855_331_619 | 9_301_800 | 746_302 | 1_651_710 | 752_598 |
btreemap_rs | 516_125 | 123_800_095 | 1_638_400 | 59_721 | 140_267 | 62_087 |
hashmap_rs | 504_191 | 53_234_034 | 1_835_008 | 21_361 | 63_796 | 22_778 |
binary_size | heapify 50k | mem | pop_min 50 | put 50 | |
---|---|---|---|---|---|
heap | 180_400 | 796_409_317 | 1_400_024 | 420_789 | 834_415 |
heap_rs | 475_167 | 5_041_620 | 819_200 | 53_561 | 22_281 |
binary_size | generate 10k | max mem | batch_get 50 | batch_put 50 | batch_remove 50 | |
---|---|---|---|---|---|---|
hashmap | 195_614 | 477_464_161 | 1_820_844 | 1_291_442 | 138_877_496 | 1_222_518 |
hashmap_rs | 504_191 | 10_964_340 | 950_272 | 20_676 | 63_102 | 21_668 |
imrc_hashmap_rs | 516_559 | 19_861_761 | 1_572_864 | 31_820 | 120_208 | 37_919 |
movm_rs | 2_035_228 | 1_098_781_054 | 2_654_208 | 2_743_966 | 6_943_650 | 5_416_733 |
movm_dynamic_rs | 2_251_739 | 555_576_815 | 2_129_920 | 2_186_795 | 3_010_179 | 2_166_068 |
pub_binary_size | sub_binary_size | subscribe_caller | subscribe_callee | publish_caller | publish_callee | |
---|---|---|---|---|---|---|
Motoko | 171_754 | 156_908 | 19_642 | 9_145 | 15_536 | 4_001 |
Rust | 564_006 | 696_272 | 63_407 | 43_190 | 89_378 | 50_543 |
binary_size | init | transfer_token | submit_proposal | vote_proposal | |
---|---|---|---|---|---|
Motoko | 291_066 | 44_646 ($\textcolor{red}{0.13\%}$) | 19_956 ($\textcolor{green}{-0.62\%}$) | 14_282 ($\textcolor{red}{0.01\%}$) | 16_944 ($\textcolor{red}{0.37\%}$) |
Rust | 944_429 | 541_999 | 102_465 | 125_877 | 138_810 |
binary_size | init | mint_token | transfer_token | |
---|---|---|---|---|
Motoko | 235_243 | 13_379 | 24_678 | 5_357 |
Rust | 998_075 | 147_146 | 381_217 | 95_471 |
generate 80k | max mem | batch_get 50 | batch_put 50 | batch_remove 50 | |
---|---|---|---|---|---|
default | 247_115_881 | 15_539_984 | 50 | 50 | 50 |
copying | 247_115_831 | 15_539_984 | 247_110_319 | 247_262_382 | 247_262_501 |
compacting | 409_365_425 | 15_539_984 | 308_339_012 | 348_775_445 | 352_663_118 |
generational | 624_423_107 | 15_540_260 | 57_009 | 1_390_483 | 1_060_163 |
Measure the cost of empty heartbeat and timer job.
setTimer
measures both the setTimer(0)
method and the execution of empty job.setTimer
and canister_global_timer
function. If it's not there, we may need to adjust the script.binary_size | heartbeat | |
---|---|---|
Motoko | 147_123 | 8_284 |
Rust | 35_650 | 1_127 |
binary_size | setTimer | cancelTimer | |
---|---|---|---|
Motoko | 162_940 | 34_639 | 1_923 |
Rust | 525_666 | 55_806 | 10_541 |
Measure different collection libraries written in both Motoko and Rust.
The library names with _rs
suffix are written in Rust; the rest are written in Motoko.
We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:
rts_max_live_size
after generate
call; For Rust, it reports the Wasm's memory page * 32Kb.O(10000 nlogn)
algorithm hitting the limit, while an O(n^2)
algorithm runs just fine.Note
- The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
- Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
hashmap
uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost ofbatch_put 50
is much higher than other data structures.hashmap_rs
uses thefxhash
crate, which is the same asstd::collections::HashMap
, but with a deterministic hasher. This ensures reproducible result.btree
comes from Byron Becker's stable BTreeMap library.zhenya_hashmap
comes from Zhenya Usenko's stable HashMap library.- The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.
binary_size | generate 50k | max mem | batch_get 50 | batch_put 50 | batch_remove 50 | |
---|---|---|---|---|---|---|
hashmap | 195_614 | 2_387_017_574 | 9_102_052 | 1_293_415 | 689_196_283 | 1_225_104 |
triemap | 201_396 | 2_289_521_095 | 9_716_008 | 893_026 | 2_115_311 | 1_191_446 |
rbtree | 199_580 | 2_117_694_679 | 10_102_184 | 823_600 | 1_917_436 | 1_081_941 |
splay | 197_628 | 2_359_847_674 | 9_302_108 | 1_305_478 | 2_225_547 | 1_306_435 |
btree | 235_285 | 2_169_842_886 | 8_157_968 | 1_006_805 | 2_000_086 | 1_089_548 |
zhenya_hashmap | 189_128 | 1_855_331_619 | 9_301_800 | 746_302 | 1_651_710 | 752_598 |
btreemap_rs | 516_125 | 123_800_095 | 1_638_400 | 59_721 | 140_267 | 62_087 |
hashmap_rs | 504_191 | 53_234_034 | 1_835_008 | 21_361 | 63_796 | 22_778 |
binary_size | heapify 50k | mem | pop_min 50 | put 50 | ||
---|---|---|---|---|---|---|
heap | 180_400 | 796_409_317 | 1_400_024 | 420_789 | 834_415 | 422_542 |
heap_rs | 475_167 | 5_041_620 | 819_200 | 53_561 | 22_281 | 53_738 |
binary_size | generate 10k | max mem | batch_get 50 | batch_put 50 | batch_remove 50 | |
---|---|---|---|---|---|---|
hashmap | 195_614 | 477_464_161 | 1_820_844 | 1_291_442 | 138_877_496 | 1_222_518 |
hashmap_rs | 504_191 | 10_964_340 | 950_272 | 20_676 | 63_102 | 21_668 |
imrc_hashmap_rs | 516_559 | 19_861_761 | 1_572_864 | 31_820 | 120_208 | 37_919 |
movm_rs | 2_035_228 | 1_098_781_054 | 2_654_208 | 2_743_966 | 6_943_650 | 5_416_733 |
movm_dynamic_rs | 2_251_739 | 555_576_815 | 2_129_920 | 2_186_795 | 3_010_179 | 2_166_068 |
Measure the cost of inter-canister calls from the Publisher & Subscriber example.
pub_binary_size | sub_binary_size | subscribe_caller | subscribe_callee | publish_caller | publish_callee | |
---|---|---|---|---|---|---|
Motoko | 171_754 | 156_908 | 19_642 | 9_145 | 15_536 | 4_001 |
Rust | 564_006 | 696_272 | 63_407 | 43_190 | 89_378 | 50_543 |
Measure the performance of some typical dapps:
heartbeat
disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.Note
- The cost difference is mainly due to the Candid serialization cost.
- Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use
serde
to dynamically deserialize data based on data on the wire.- We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by
serde
.- For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.
binary_size | init | transfer_token | submit_proposal | vote_proposal | |
---|---|---|---|---|---|
Motoko | 291_066 | 44_646 | 19_956 | 14_282 | 16_944 |
Rust | 944_429 | 541_999 | 102_465 | 125_877 | 138_810 |
binary_size | init | mint_token | transfer_token | |
---|---|---|---|---|
Motoko | 235_243 | 13_379 | 24_678 | 5_357 |
Rust | 998_075 | 147_146 | 381_217 | 95_471 |
Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_live_size
after generate
call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit.
generate
will trigger the copying GC. The rest of the methods will not trigger GC.--force-gc --copying-gc
.--force-gc --compacting-gc
.--force-gc --generational-gc
.generate 80k | max mem | batch_get 50 | batch_put 50 | batch_remove 50 | |
---|---|---|---|---|---|
default | 247_115_881 | 15_539_984 | 50 | 50 | 50 |
copying | 247_115_831 | 15_539_984 | 247_110_319 | 247_262_382 | 247_262_501 |
compacting | 409_365_425 | 15_539_984 | 308_339_012 | 348_775_445 | 352_663_118 |
generational | 624_423_107 | 15_540_260 | 57_009 | 1_390_483 | 1_060_163 |
Note Diffing the performance result against the published result from main branch
binary_size | heartbeat | |
---|---|---|
Motoko | 147_123 | 11_938 |
Rust | 35_650 | 587 ($\textcolor{green}{-47.91\%}$) |
binary_size | setTimer | cancelTimer | |
---|---|---|---|
Motoko | 162_940 | 34_639 | 1_923 |
Rust | 525_666 | 55_806 | 10_541 |
binary_size | generate 50k | max mem | batch_get 50 | batch_put 50 | batch_remove 50 | |
---|---|---|---|---|---|---|
hashmap | 195_614 | 2_387_017_574 | 9_102_052 | 1_293_415 | 689_196_283 | 1_225_104 |
triemap | 201_396 | 2_289_521_095 | 9_716_008 | 893_026 | 2_115_311 | 1_191_446 |
rbtree | 199_580 | 2_117_694_679 | 10_102_184 | 823_600 | 1_917_436 | 1_081_941 |
splay | 197_628 | 2_359_847_674 | 9_302_108 | 1_305_478 | 2_225_547 | 1_306_435 |
btree | 235_285 | 2_169_842_886 | 8_157_968 | 1_006_805 | 2_000_086 | 1_089_548 |
zhenya_hashmap | 189_128 | 1_855_331_619 | 9_301_800 | 746_302 | 1_651_710 | 752_598 |
btreemap_rs | 516_125 | 123_800_095 | 1_638_400 | 59_721 | 140_267 | 62_087 |
hashmap_rs | 504_191 | 53_234_034 | 1_835_008 | 21_361 | 63_796 | 22_778 |
binary_size | heapify 50k | mem | pop_min 50 | put 50 | |
---|---|---|---|---|---|
heap | 180_400 | 796_409_317 | 1_400_024 | 420_789 | 834_415 |
heap_rs | 475_167 | 5_041_620 | 819_200 | 53_561 | 22_281 |
binary_size | generate 10k | max mem | batch_get 50 | batch_put 50 | batch_remove 50 | |
---|---|---|---|---|---|---|
hashmap | 195_614 | 477_464_161 | 1_820_844 | 1_291_442 | 138_877_496 | 1_222_518 |
hashmap_rs | 504_191 | 10_964_340 | 950_272 | 20_676 | 63_102 | 21_668 |
imrc_hashmap_rs | 516_559 | 19_861_761 | 1_572_864 | 31_820 | 120_208 | 37_919 |
movm_rs | 2_035_228 | 1_098_781_054 | 2_654_208 | 2_743_966 | 6_943_650 | 5_416_733 |
movm_dynamic_rs | 2_251_739 | 555_576_815 | 2_129_920 | 2_186_795 | 3_010_179 | 2_166_068 |
pub_binary_size | sub_binary_size | subscribe_caller | subscribe_callee | publish_caller | publish_callee | |
---|---|---|---|---|---|---|
Motoko | 171_754 | 156_908 | 19_642 | 9_145 | 15_536 | 4_001 |
Rust | 564_006 | 696_272 | 63_407 | 43_190 | 89_378 | 50_543 |
binary_size | init | transfer_token | submit_proposal | vote_proposal | |
---|---|---|---|---|---|
Motoko | 291_066 | 44_646 ($\textcolor{red}{0.13\%}$) | 19_838 ($\textcolor{green}{-1.21\%}$) | 14_282 ($\textcolor{red}{0.01\%}$) | 16_944 ($\textcolor{red}{0.37\%}$) |
Rust | 944_429 | 541_999 | 102_465 | 125_877 | 138_810 |
binary_size | init | mint_token | transfer_token | |
---|---|---|---|---|
Motoko | 235_243 | 13_379 | 24_678 | 5_357 |
Rust | 998_075 | 147_146 | 381_217 | 95_471 |
generate 80k | max mem | batch_get 50 | batch_put 50 | batch_remove 50 | |
---|---|---|---|---|---|
default | 247_115_881 | 15_539_984 | 50 | 50 | 50 |
copying | 247_115_831 | 15_539_984 | 247_110_319 | 247_262_382 | 247_262_501 |
compacting | 409_365_425 | 15_539_984 | 308_339_012 | 348_775_445 | 352_663_118 |
generational | 624_423_107 | 15_540_260 | 57_009 | 1_390_483 | 1_060_163 |
Warning The flamegraph link only works after you merge.
Measure the cost of empty heartbeat and timer job.
setTimer
measures both the setTimer(0)
method and the execution of empty job.setTimer
and canister_global_timer
function. If it's not there, we may need to adjust the script.binary_size | heartbeat | |
---|---|---|
Motoko | 147_123 | 11_938 |
Rust | 35_650 | 587 |
binary_size | setTimer | cancelTimer | |
---|---|---|---|
Motoko | 162_940 | 34_639 | 1_923 |
Rust | 525_666 | 55_806 | 10_541 |
Measure different collection libraries written in both Motoko and Rust.
The library names with _rs
suffix are written in Rust; the rest are written in Motoko.
We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:
rts_max_live_size
after generate
call; For Rust, it reports the Wasm's memory page * 32Kb.O(10000 nlogn)
algorithm hitting the limit, while an O(n^2)
algorithm runs just fine.Note
- The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
- Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
hashmap
uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost ofbatch_put 50
is much higher than other data structures.hashmap_rs
uses thefxhash
crate, which is the same asstd::collections::HashMap
, but with a deterministic hasher. This ensures reproducible result.btree
comes from Byron Becker's stable BTreeMap library.zhenya_hashmap
comes from Zhenya Usenko's stable HashMap library.- The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.
binary_size | generate 50k | max mem | batch_get 50 | batch_put 50 | batch_remove 50 | |
---|---|---|---|---|---|---|
hashmap | 195_614 | 2_387_017_574 | 9_102_052 | 1_293_415 | 689_196_283 | 1_225_104 |
triemap | 201_396 | 2_289_521_095 | 9_716_008 | 893_026 | 2_115_311 | 1_191_446 |
rbtree | 199_580 | 2_117_694_679 | 10_102_184 | 823_600 | 1_917_436 | 1_081_941 |
splay | 197_628 | 2_359_847_674 | 9_302_108 | 1_305_478 | 2_225_547 | 1_306_435 |
btree | 235_285 | 2_169_842_886 | 8_157_968 | 1_006_805 | 2_000_086 | 1_089_548 |
zhenya_hashmap | 189_128 | 1_855_331_619 | 9_301_800 | 746_302 | 1_651_710 | 752_598 |
btreemap_rs | 516_125 | 123_800_095 | 1_638_400 | 59_721 | 140_267 | 62_087 |
hashmap_rs | 504_191 | 53_234_034 | 1_835_008 | 21_361 | 63_796 | 22_778 |
binary_size | heapify 50k | mem | pop_min 50 | put 50 | ||
---|---|---|---|---|---|---|
heap | 180_400 | 796_409_317 | 1_400_024 | 420_789 | 834_415 | 422_542 |
heap_rs | 475_167 | 5_041_620 | 819_200 | 53_561 | 22_281 | 53_738 |
binary_size | generate 10k | max mem | batch_get 50 | batch_put 50 | batch_remove 50 | |
---|---|---|---|---|---|---|
hashmap | 195_614 | 477_464_161 | 1_820_844 | 1_291_442 | 138_877_496 | 1_222_518 |
hashmap_rs | 504_191 | 10_964_340 | 950_272 | 20_676 | 63_102 | 21_668 |
imrc_hashmap_rs | 516_559 | 19_861_761 | 1_572_864 | 31_820 | 120_208 | 37_919 |
movm_rs | 2_035_228 | 1_098_781_054 | 2_654_208 | 2_743_966 | 6_943_650 | 5_416_733 |
movm_dynamic_rs | 2_251_739 | 555_576_815 | 2_129_920 | 2_186_795 | 3_010_179 | 2_166_068 |
Measure the cost of inter-canister calls from the Publisher & Subscriber example.
pub_binary_size | sub_binary_size | subscribe_caller | subscribe_callee | publish_caller | publish_callee | |
---|---|---|---|---|---|---|
Motoko | 171_754 | 156_908 | 19_642 | 9_145 | 15_536 | 4_001 |
Rust | 564_006 | 696_272 | 63_407 | 43_190 | 89_378 | 50_543 |
Measure the performance of some typical dapps:
heartbeat
disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.Note
- The cost difference is mainly due to the Candid serialization cost.
- Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use
serde
to dynamically deserialize data based on data on the wire.- We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by
serde
.- For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.
binary_size | init | transfer_token | submit_proposal | vote_proposal | |
---|---|---|---|---|---|
Motoko | 291_066 | 44_646 | 19_838 | 14_282 | 16_944 |
Rust | 944_429 | 541_999 | 102_465 | 125_877 | 138_810 |
binary_size | init | mint_token | transfer_token | |
---|---|---|---|---|
Motoko | 235_243 | 13_379 | 24_678 | 5_357 |
Rust | 998_075 | 147_146 | 381_217 | 95_471 |
Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_live_size
after generate
call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit.
generate
will trigger the copying GC. The rest of the methods will not trigger GC.--force-gc --copying-gc
.--force-gc --compacting-gc
.--force-gc --generational-gc
.generate 80k | max mem | batch_get 50 | batch_put 50 | batch_remove 50 | |
---|---|---|---|---|---|
default | 247_115_881 | 15_539_984 | 50 | 50 | 50 |
copying | 247_115_831 | 15_539_984 | 247_110_319 | 247_262_382 | 247_262_501 |
compacting | 409_365_425 | 15_539_984 | 308_339_012 | 348_775_445 | 352_663_118 |
generational | 624_423_107 | 15_540_260 | 57_009 | 1_390_483 | 1_060_163 |
Download the artifacts for this pull request: