Closed mnaamani closed 1 year ago
What runtime ugprade are you talking about here?
Mainnet
=> Ephesus
Ephesus
=> Nara
What runtime ugprade are you talking about here?
1. `Mainnet` => `Ephesus` 2. `Ephesus` => `Nara`
This would be ephesus to nara, spec version 2001 -> 2002 (in indexer logs however we see the spec update to 3000 that was original planned version but we since reverted to 2002)
Upon further investigation the failure to decode storage values at particular blocks are happening to the one ore more blocks that were being fetched (these blocks are always before the runtime upgrade block) just before the node goes through a runtime upgrade and polkadot.js api detects the runtime upgrade.
The indexer uses a ApiDecorated<'promise'>
to query storage at a specific block. This is done by using the .at(blockHash) on an ApiPromise
(the main api connection to the node being indexed). The decorated api also fetches the runtime types for the runtime version at that particular block to know how to decode the values. This is stored internally in a versioned block registry.
As the indexer happily indexing blocks, and the node runtime is updated, internally the polkadot-js api detects this through metadata subscription updates. It then updates the default registry with new metadata.
I believe there is some data race condition happening that for some reason is causing the decoration for the failing blocks to either be overridden/corrupted or just not created for the correct version when the api is updating the default registry. (I think it is because the default registry was also being used for the decorated api
I can't pinpoint it but the polkadot-js core maintainers might be able to find it more easily.
A few things I tried:
api.getBlockRegistry(..)
and call .at(blockHash)
again, to nudge polkadot-js api to re-fetch correct metadata for the block.related issues: https://github.com/polkadot-js/api/issues/4596 https://github.com/polkadot-js/api/issues/4518 https://github.com/polkadot-js/api/issues/4557
Problem
When running the runtime upgrade intergration tests, the runtime upgrade works successfully but the test is failing at the step waiting for status of runtime upgrade proposal execution status to go from "ProposalStatusGracing" to "ProposalStatusExecuted" because the QN seems to not be properly processing the runtime upgrade block with the "ProposalExecuted" event in it. And processor altogether stops processing any events.
The first few lines of logs after the runtime upgrade (for the indexer) the RPC-CORE error which comes from polkadot-js/api immediately after upgrade.
If we allow indexer to keep running, eventually:
Recovery
Although the indexer is observed to make progress fetching additional blocks, the processor behaves correctly, ie it doesn't make progress. That is a good thing (we wouldn't get inconsistent state)
When indexer restarts, it continues processing from a couple of blocks before the runtime upgrade block, logging a few
index-builder:indexer Block N has already been indexed
lines, until it finally makes progress. The indexer can also be restarted manually (without resetting db) and it also "recovers" in the same way.Some questions:
Relevant runtime type changes that are at the root cause of this behavior?
Some background about what type changes in the substrate runtime are possibly causing this.
Note that each block has a
timestamp.set
extrinsic added by the block producer. Therefore each block produced will have asystem.extrinsicSuccess
Event, and therefore aDispatchInfo
type.Before Upgrade
After Upgrade
Another changes in new substrate but not likely relevant, adding it just in-case:
I think it just affects the storage_info annotation and how benchmarks handle counting reading/writing to the storage key. The change in new substrate is that this is now an unbounded (git diff):
Indexer Logs
Runtime upgrade at block 607
┆Issue is synchronized with this Asana task by Unito