Warning
The flamegraph link only works after you merge.
Collection libraries
Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.
We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:
generate 50k. Insert 50k Nat32 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_live_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.
💎 Takeaways
The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an O(10000 nlogn) algorithm hitting the limit, while an O(n^2) algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.
Note
The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
rbtree's remove method only performs logical removal of the elements. The removed elements still reside in memory, but not reachable from the map. A complete implementation of remove would cost a bit more than reported here.
The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.
Measure the cost of empty heartbeat and timer job.
setTimer measures both the setTimer(0) method and the execution of empty job.
It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.
Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_live_size after generate call.
default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
copying. Compile with --force-gc --copying-gc.
compacting. Compile with --force-gc --compacting-gc.
generational. Compile with --force-gc --generational-gc.
The cost difference is mainly due to the Candid serialization cost.
Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.
Measure different collection libraries written in both Motoko and Rust. The library names with
_rs
suffix are written in Rust; the rest are written in Motoko.We use the same random number generator with fixed seed to ensure that all collections contain the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:
rts_max_live_size
aftergenerate
call; For Rust, it reports the Wasm's memory page * 32Kb.💎 Takeaways
O(10000 nlogn)
algorithm hitting the limit, while anO(n^2)
algorithm runs just fine.Map
Priority queue
MoVM
Heartbeat / Timer
Measure the cost of empty heartbeat and timer job.
setTimer
measures both thesetTimer(0)
method and the execution of empty job.setTimer
andcanister_global_timer
function. If it's not there, we may need to adjust the script.Heartbeat
Timer
Motoko Garbage Collection
Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports
rts_max_live_size
aftergenerate
call.generate
will trigger the copying GC. The rest of the methods will not trigger GC.--force-gc --copying-gc
.--force-gc --compacting-gc
.--force-gc --generational-gc
.Publisher & Subscriber
Measure the cost of inter-canister calls from the Publisher & Subscriber example.
Sample Dapps
Measure the performance of some typical dapps:
heartbeat
disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.Basic DAO
DIP721 NFT