Closed liuchengxu closed 1 year ago
There is a similar TODO left for fraud proof: https://github.com/subspace/subspace/blob/96fd56ec2a75a963327a01f49c7d42d80b5b847a/domains/client/domain-executor/src/aux_schema.rs#L329-L339
I finally figured it out after digging into it. The root cause is the receipts are tracked in the local storage using the domain_hash
as the key which can be problematic for the core domains (the system domain is not affected by this issue due to the injected primary block hash digest): assuming no core domain bundles in the primary block, then two primary fork blocks 100a and 100b will produce one identical core domain block 100 (they are based on the same domain parent block and have the same block content), two primary blocks means two receipts, but they share one key in terms of tracking the receipts, now the latter receipt will override the former one due to the collision, causing always submitting the receipt from the fork.
2023-04-06 11:00:44.515 DEBUG tokio-runtime-worker domain_client_executor::core_bundle_processor: [CoreDomain] Building a new domain block from primary block #34,0x8611…5367 on top of #33,0xc1f6…8d2d
2023-04-06 11:00:44.519 DEBUG tokio-runtime-worker domain_client_executor::domain_block_processor: [CoreDomain] Built new domain block #34,0xf260…8b6a from primary block #34,0x8611…5367
2023-04-06 11:00:45.316 DEBUG tokio-runtime-worker domain_client_executor::core_bundle_processor: [CoreDomain] Building a new domain block from primary block #34,0xd574…0c4e on top of #33,0xc1f6…8d2d
2023-04-06 11:00:45.318 DEBUG tokio-runtime-worker domain_client_executor::domain_block_processor: [CoreDomain] Built new domain block #34,0xf260…8b6a from primary block #34,0xd574…0c4e
The solution is to use primary_hash
as the key to track the receipts as mentioned before.
The idea is to find the right block at height max_allowed, which can be done by traversing back from the current tip to the height max_allowed, and then we can ensure the collected receipts are on the same branch of the canonical chain.
I haven't been able to reproduce this scenario (wrong receipt caused by using the block number) in test, thus it can likely be okay. Will leave it as is for now and only change the storage key of receipts in the aux db for this issue.
Problem
It's observed that a bundle may contain a receipt that points to a block that is not on the canonical primary chain. Restarting the node does not help, it runs into the same error again.
When reviewing the code, I realized that we didn't handle the primary chain forks properly when collecting the receipts. Basically, we first calculate the range of receipts we need to include in the new bundle, which is represented by the block number
[head_receipt_number, max_allowed]
, the problem isclient.hash(block_number)
likely returns a block hash which is not on the canonical chain, the exact scenario we are running into is there is primary chain fork #100a, #100b, #100a becomes the canonical chain and #100b is dropped later, but the executor still loads the receipt of #100b for block 100. https://github.com/subspace/subspace/blob/ef3a6fa29ab77af629e0a1114c62216d83979350/domains/client/domain-executor/src/domain_bundle_proposer.rs#L162-L169Solution
The idea is to find the right block at height
max_allowed
, which can be done by traversing back from the current tip to the heightmax_allowed
, and then we can ensure the collected receipts are on the same branch of the canonical chain.It reminds me of https://github.com/subspace/subspace/issues/1254#issuecomment-1473990921 in which we think it makes sense to use
primary_block_hash
instead ofdomain_block_hash
as the key to track the receipt, still dwelling on whether it should be done with this issue together.BTW, the fix should have a company test as it's expected that now we are able to mock the primary chain forks.