DataHighway Node. A blockchain being built with Substrate to become a parachain on the Polkadot network. Planned features include a decentralized LPWAN roaming hub for LoRaWAN IoT devices and network operator roaming agreements, participative mining, an inter-chain data market, and DAO governance. http://www.datahighway.com
based on my review of previous discussion between Alan S, Basti and Sergei in Element's Parachain Technical room, Alan S shared how he profiled their parachain block authority execution time for benchmarking and stack analysis with trace debugging as follows:
profiled a parachain's block authority execution time for benchmarking and stack analysis with trace debugging
run your node using flags --dev, -lsync=trace, -lsub-libp2p=trace
run perf record -F 999 -p <pid_of_your_node> --call-graph dwarf
wait for the block to be produced by your node and then Ctrl+C to stop the perf (you can keep the node running to repeat later)
get the perf script perf script --no-inline > perf.script.data
open it at https://www.speedscope.app to view execution (i.e. perf.basti-cache-runtime-fix.data from PR #9611 shared in Element's "Parachain Technical" room)
they were using the default cumulus authorship deadline is 500ms (i.e. 12000(1/24) = SLOT_DURATION block_proposal_slot_portion), where SLOT_DURATION equals their MILLISECS_PER_BLOCK.
but for the DataHighway's Westlake, we're currently using 4320 for MILLISECS_PER_BLOCK, so our slot duration is much less at 180ms, so maybe we need to change it to the following (i.e. 500/4230 and 750/4320 if we want 500ms as our cumulus authorship deadline too
// We got around 500ms for proposing
block_proposal_slot_portion: SlotProportion::new(1f32 / 8f32),
// And a maximum of 750ms if slots are skipped
max_block_proposal_slot_portion: Some(SlotProportion::new(1f32 / 6f32)),
Note that in the polkadot repo https://github.com/paritytech/polkadot, both millau and rialto are using 6000 for MILLISECS_PER_BLOCK, and they are using block_proposal_slot_portion: SlotProportion::new(2f32 / 3f32), and max_block_proposal_slot_portion: None,
Alan S they discovered that their 500ms was split up as follows:
500ms - parachain block authoring
140ms - reserved for initialization/finalization (i.e. sc_basic_authorship::basic_authorship)
65% - block production (i.e. including verifying extrinsic signatures for inclusion)
35% - block finalization
360ms - applying extrinsics and overhead (apply_extrinsic)
25% - overhead retrieving runtime_code() from storage cached (i.e. sc_client_db::storage_cache) runtime_code() (only if there is no new runtime code, otherwise fetch it from TrieBackend)
50% - overhead of runtime_code() execution blake2 related before each extrinsic is applied apply_extrinsic_call_at...contextual_call/runtime_code with blake2 (when running node with --dev there isn't this overhead)
25% - apply extrinsics extrinsic.check (i.e. ecdsa signature verification) (requires ~100ms for 100 extrinsics using system::remark)
profile our parachain using perf as mentioned previously with the kinds of extrinsics we'll be using to undertake benchmarking and stack analysis of the block authoring execution time, and use trace debugging to determine whether we need to:
increase the block proposal cumulus deadline (i.e. block_proposal_slot_portion) to compensate for production overhead (see https://github.com/paritytech/substrate/pull/9611 that increased the amount of transactions per block by ~3x)
re-evaluate the ExtrinsicBaseWeight we are using in the fork of Substrate that we are using as dependencies
check whether we need to change the leniency strategy used by the block_proposal_slot_portion in the fork of Susbtrate we are using as dependencies (i.e. change from Exponential to Linear for sc_consensus_slots::SlotLenienceType in sc_consensus_slots::proposing_remaining_duration
pub const MILLISECS_PER_BLOCK: u64 = 12000;
pub const SLOT_DURATION: u64 = MILLISECS_PER_BLOCK;
// We got around 500ms for proposing
block_proposal_slot_portion: SlotProportion::new(1f32 / 24f32),
// And a maximum of 750ms if slots are skipped
max_block_proposal_slot_portion: Some(SlotProportion::new(1f32 / 16f32)),
...
/// We assume that ~10% of the block weight is consumed by `on_initalize` handlers.
/// This is used to limit the maximal weight of a single extrinsic.
const AVERAGE_ON_INITIALIZE_RATIO: Perbill = Perbill::from_percent(10);
/// We allow `Normal` extrinsics to fill up the block up to 75%, the rest can be used
/// by Operational extrinsics.
const NORMAL_DISPATCH_RATIO: Perbill = Perbill::from_percent(75);
/// We allow for 0.5 of a second of compute with a 12 second average block time.
const MAXIMUM_BLOCK_WEIGHT: Weight = WEIGHT_PER_SECOND / 2;
based on my review of previous discussion between Alan S, Basti and Sergei in Element's Parachain Technical room, Alan S shared how he profiled their parachain block authority execution time for benchmarking and stack analysis with trace debugging as follows:
profiled a parachain's block authority execution time for benchmarking and stack analysis with trace debugging
--dev
,-lsync=trace
,-lsub-libp2p=trace
perf record -F 999 -p <pid_of_your_node> --call-graph dwarf
perf script --no-inline > perf.script.data
they were using the default cumulus authorship deadline is 500ms (i.e. 12000(1/24) = SLOT_DURATION block_proposal_slot_portion), where SLOT_DURATION equals their MILLISECS_PER_BLOCK.
but for the DataHighway's Westlake, we're currently using 4320 for MILLISECS_PER_BLOCK, so our slot duration is much less at 180ms, so maybe we need to change it to the following (i.e. 500/4230 and 750/4320 if we want 500ms as our cumulus authorship deadline too
Note that in the polkadot repo https://github.com/paritytech/polkadot, both millau and rialto are using 6000 for MILLISECS_PER_BLOCK, and they are using
block_proposal_slot_portion: SlotProportion::new(2f32 / 3f32),
andmax_block_proposal_slot_portion: None,
Alan S they discovered that their 500ms was split up as follows:
500ms - parachain block authoring 140ms - reserved for initialization/finalization (i.e. sc_basic_authorship::basic_authorship) 65% - block production (i.e. including verifying extrinsic signatures for inclusion) 35% - block finalization 360ms - applying extrinsics and overhead (
apply_extrinsic
) 25% - overhead retrieving runtime_code() from storage cached (i.e. sc_client_db::storage_cache) runtime_code() (only if there is no new runtime code, otherwise fetch it from TrieBackend) 50% - overhead of runtime_code() execution blake2 related before each extrinsic is appliedapply_extrinsic_call_at...contextual_call/runtime_code with blake2
(when running node with--dev
there isn't this overhead) 25% - apply extrinsicsextrinsic.check
(i.e. ecdsa signature verification) (requires ~100ms for 100 extrinsics usingsystem::remark
)then Basti created this PR https://github.com/paritytech/substrate/pull/9611 that resulted in an improvement with basic extrinsics from 180tx/block max to 450tx/block
i believe we need to:
perf
as mentioned previously with the kinds of extrinsics we'll be using to undertake benchmarking and stack analysis of the block authoring execution time, and use trace debugging to determine whether we need to:block_proposal_slot_portion
) to compensate for production overhead (see https://github.com/paritytech/substrate/pull/9611 that increased the amount of transactions per block by ~3x)ExtrinsicBaseWeight
we are using in the fork of Substrate that we are using as dependenciesExponential
toLinear
forsc_consensus_slots::SlotLenienceType
insc_consensus_slots::proposing_remaining_duration
note: some user mentioned that "transactions take progressively longer the later they go into a block in a linear way"
here are extracts of relevant parts of codebases that we should consider in possible changes in our 'ilya/parachain-update' branch: