dfinity / canister-profiling

Collection of canister performance benchmarks
Apache License 2.0
21 stars 8 forks source link

Add NFT benchmark #27

Closed chenyan-dfinity closed 1 year ago

github-actions[bot] commented 1 year ago

Warning The flamegraph link only works after you merge.

Collection libraries

Measure different collection libraries written in both Motoko and Rust. The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

💎 Takeaways

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • rbtree's remove method only performs logical removal of the elements. The removed elements still reside in memory, but not reachable from the map. A complete implementation of remove would cost a bit more than reported here.
  • The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 203_891 2_456_108_447 9_102_052 1_319_582 710_191_522 1_248_981
triemap 208_030 2_422_807_277 9_716_008 920_103 2_236_198 1_271_468
rbtree 200_177 2_322_981_599 10_102_164 844_569 2_120_544 998_470
splay 205_437 2_528_971_585 9_302_108 1_450_431 2_398_808 1_449_735
btreemap_rs 526_971 123_797_849 1_638_400 59_755 140_301 62_121
hashmap_rs 515_644 53_134_200 1_835_008 21_395 63_730 22_812

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 188_543 814_736_944 1_400_024 482_960 862_276 485_051
heap_rs 485_570 5_041_733 819_200 53_595 22_315 53_772

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 203_891 491_293_274 1_820_844 1_317_651 143_128_343 1_245_345
hashmap_rs 515_644 10_944_500 950_272 20_710 63_036 21_702
imrc_hashmap_rs 526_570 19_861_874 1_572_864 31_854 120_242 37_953
movm_rs 2_089_325 1_131_930_057 2_654_208 2_831_694 7_116_074 5_565_647
movm_dynamic_rs 2_323_019 576_798_134 2_129_920 2_249_781 3_116_104 2_208_382

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

Heartbeat

binary_size heartbeat
Motoko 156_504 12_264
Rust 35_604 1_127

Timer

binary_size setTimer cancelTimer
Motoko 172_061 33_849 1_949
Rust 534_970 55_858 10_463

Motoko Garbage Collection

Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_live_size after generate call.

generate 80k max mem batch_get 50 batch_put 50 batch_remove 50
default 3_905_834_633 15_539_984 926_657 2_258_879 1_299_907
copying 3_905_834_583 15_539_984 266_740_591 268_236_330 267_277_474
compacting 4_052_409_880 15_539_984 308_612_579 350_383_601 353_312_004
generational 4_276_930_230 15_540_260 987_663 3_659_944 2_370_099

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe publish
Motoko 175_721 165_324 caller (20_084) / callee (6_291) caller (16_044) / callee (3_947)
Rust 575_366 706_521 caller (63_608) / callee (43_332) caller (89_685) / callee (50_643)

Sample Dapps

Measure the performance of some typical dapps:

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 290_538 46_273 21_192 15_307 18_160
Rust 954_334 542_761 102_643 126_129 139_098

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 237_757 13_559 24_393 5_407
Rust 1_017_218 147_558 380_454 92_674