DataHighway-DHX / node

DataHighway Node. A blockchain being built with Substrate to become a parachain on the Polkadot network. Planned features include a decentralized LPWAN roaming hub for LoRaWAN IoT devices and network operator roaming agreements, participative mining, an inter-chain data market, and DAO governance. http://www.datahighway.com
http://www.datahighway.com
GNU General Public License v3.0
52 stars 11 forks source link

benchmark parachain and standalone chain #232

Open ltfschoen opened 3 years ago

ltfschoen commented 3 years ago

based on my review of previous discussion between Alan S, Basti and Sergei in Element's Parachain Technical room, Alan S shared how he profiled their parachain block authority execution time for benchmarking and stack analysis with trace debugging as follows:

profiled a parachain's block authority execution time for benchmarking and stack analysis with trace debugging

they were using the default cumulus authorship deadline is 500ms (i.e. 12000(1/24) = SLOT_DURATION block_proposal_slot_portion), where SLOT_DURATION equals their MILLISECS_PER_BLOCK.

but for the DataHighway's Westlake, we're currently using 4320 for MILLISECS_PER_BLOCK, so our slot duration is much less at 180ms, so maybe we need to change it to the following (i.e. 500/4230 and 750/4320 if we want 500ms as our cumulus authorship deadline too

// We got around 500ms for proposing
block_proposal_slot_portion: SlotProportion::new(1f32 / 8f32),
// And a maximum of 750ms if slots are skipped
max_block_proposal_slot_portion: Some(SlotProportion::new(1f32 / 6f32)),

Note that in the polkadot repo https://github.com/paritytech/polkadot, both millau and rialto are using 6000 for MILLISECS_PER_BLOCK, and they are using block_proposal_slot_portion: SlotProportion::new(2f32 / 3f32), and max_block_proposal_slot_portion: None,

Alan S they discovered that their 500ms was split up as follows:

500ms - parachain block authoring 140ms - reserved for initialization/finalization (i.e. sc_basic_authorship::basic_authorship) 65% - block production (i.e. including verifying extrinsic signatures for inclusion) 35% - block finalization 360ms - applying extrinsics and overhead (apply_extrinsic) 25% - overhead retrieving runtime_code() from storage cached (i.e. sc_client_db::storage_cache) runtime_code() (only if there is no new runtime code, otherwise fetch it from TrieBackend) 50% - overhead of runtime_code() execution blake2 related before each extrinsic is applied apply_extrinsic_call_at...contextual_call/runtime_code with blake2 (when running node with --dev there isn't this overhead) 25% - apply extrinsics extrinsic.check (i.e. ecdsa signature verification) (requires ~100ms for 100 extrinsics using system::remark)

then Basti created this PR https://github.com/paritytech/substrate/pull/9611 that resulted in an improvement with basic extrinsics from 180tx/block max to 450tx/block

i believe we need to:

note: some user mentioned that "transactions take progressively longer the later they go into a block in a linear way"

here are extracts of relevant parts of codebases that we should consider in possible changes in our 'ilya/parachain-update' branch:

pub const MILLISECS_PER_BLOCK: u64 = 12000;
pub const SLOT_DURATION: u64 = MILLISECS_PER_BLOCK;

// We got around 500ms for proposing
block_proposal_slot_portion: SlotProportion::new(1f32 / 24f32),
// And a maximum of 750ms if slots are skipped
max_block_proposal_slot_portion: Some(SlotProportion::new(1f32 / 16f32)),

...

/// We assume that ~10% of the block weight is consumed by `on_initalize` handlers.
/// This is used to limit the maximal weight of a single extrinsic.
const AVERAGE_ON_INITIALIZE_RATIO: Perbill = Perbill::from_percent(10);
/// We allow `Normal` extrinsics to fill up the block up to 75%, the rest can be used
/// by  Operational  extrinsics.
const NORMAL_DISPATCH_RATIO: Perbill = Perbill::from_percent(75);
/// We allow for 0.5 of a second of compute with a 12 second average block time.
const MAXIMUM_BLOCK_WEIGHT: Weight = WEIGHT_PER_SECOND / 2;
pub const WEIGHT_PER_SECOND: Weight = 1_000_000_000_000;
pub const WEIGHT_PER_MILLIS: Weight = WEIGHT_PER_SECOND / 1000; // 1_000_000_000
pub const WEIGHT_PER_MICROS: Weight = WEIGHT_PER_MILLIS / 1000; // 1_000_000

/// Executing 10,000 System remarks (no-op) txs takes ~1.26 seconds -> ~125 µs per tx
pub const ExtrinsicBaseWeight: Weight = 125 * WEIGHT_PER_MICROS;