autonomys / subspace

Subspace Network reference implementation
https://subspace.network
384 stars 244 forks source link

Bundle may incorrectly include the receipt from the primary chain fork #1353

Closed liuchengxu closed 1 year ago

liuchengxu commented 1 year ago

Problem

It's observed that a bundle may contain a receipt that points to a block that is not on the canonical primary chain. Restarting the node does not help, it runs into the same error again.

2023-04-03 20:10:31.065 ERROR tokio-runtime-worker runtime::domain-registry: [SystemDomain] Receipt of DomainId(1) #65555,0xe5c4b68acbc006ef33d4eb575a5566f8209560c8388650c29b307e1f6d3eead7 points to an unknown primary block, expected: #65555,Some(0xb949660c8dd821a1baffd520b4758ba2b069adc38266728f69acede4025df6c0)

When reviewing the code, I realized that we didn't handle the primary chain forks properly when collecting the receipts. Basically, we first calculate the range of receipts we need to include in the new bundle, which is represented by the block number [head_receipt_number, max_allowed], the problem is client.hash(block_number) likely returns a block hash which is not on the canonical chain, the exact scenario we are running into is there is primary chain fork #100a, #100b, #100a becomes the canonical chain and #100b is dropped later, but the executor still loads the receipt of #100b for block 100. https://github.com/subspace/subspace/blob/ef3a6fa29ab77af629e0a1114c62216d83979350/domains/client/domain-executor/src/domain_bundle_proposer.rs#L162-L169

Solution

The idea is to find the right block at height max_allowed, which can be done by traversing back from the current tip to the height max_allowed, and then we can ensure the collected receipts are on the same branch of the canonical chain.

It reminds me of https://github.com/subspace/subspace/issues/1254#issuecomment-1473990921 in which we think it makes sense to use primary_block_hash instead of domain_block_hash as the key to track the receipt, still dwelling on whether it should be done with this issue together.

BTW, the fix should have a company test as it's expected that now we are able to mock the primary chain forks.

NingLin-P commented 1 year ago

There is a similar TODO left for fraud proof: https://github.com/subspace/subspace/blob/96fd56ec2a75a963327a01f49c7d42d80b5b847a/domains/client/domain-executor/src/aux_schema.rs#L329-L339

liuchengxu commented 1 year ago

I finally figured it out after digging into it. The root cause is the receipts are tracked in the local storage using the domain_hash as the key which can be problematic for the core domains (the system domain is not affected by this issue due to the injected primary block hash digest): assuming no core domain bundles in the primary block, then two primary fork blocks 100a and 100b will produce one identical core domain block 100 (they are based on the same domain parent block and have the same block content), two primary blocks means two receipts, but they share one key in terms of tracking the receipts, now the latter receipt will override the former one due to the collision, causing always submitting the receipt from the fork.

2023-04-06 11:00:44.515 DEBUG tokio-runtime-worker domain_client_executor::core_bundle_processor: [CoreDomain] Building a new domain block from primary block #34,0x8611…5367 on top of #33,0xc1f6…8d2d
2023-04-06 11:00:44.519 DEBUG tokio-runtime-worker domain_client_executor::domain_block_processor: [CoreDomain] Built new domain block #34,0xf260…8b6a from primary block #34,0x8611…5367
2023-04-06 11:00:45.316 DEBUG tokio-runtime-worker domain_client_executor::core_bundle_processor: [CoreDomain] Building a new domain block from primary block #34,0xd574…0c4e on top of #33,0xc1f6…8d2d
2023-04-06 11:00:45.318 DEBUG tokio-runtime-worker domain_client_executor::domain_block_processor: [CoreDomain] Built new domain block #34,0xf260…8b6a from primary block #34,0xd574…0c4e

The solution is to use primary_hash as the key to track the receipts as mentioned before.

The idea is to find the right block at height max_allowed, which can be done by traversing back from the current tip to the height max_allowed, and then we can ensure the collected receipts are on the same branch of the canonical chain.

I haven't been able to reproduce this scenario (wrong receipt caused by using the block number) in test, thus it can likely be okay. Will leave it as is for now and only change the storage key of receipts in the aux db for this issue.