filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.85k stars 1.27k forks source link

Post-upgrade: export range fails due to CID not found in the blockstore #10492

Open hsanjuan opened 1 year ago

hsanjuan commented 1 year ago

Checklist

Lotus component

Lotus Version

Daemon:  1.21.0-dev+mainnet+git.5707732+api1.5.0
Local: lotus version 1.21.0-dev+mainnet+git.5707732

Repro Steps

lotus chain export-range --internal --messages --receipts --stateroots --workers 50 --head "2684160" --tail "2681280" --write-buffer=5000000 export.car

Describe the Bug

Running Lotus on the commit where the 1.20 branch was merged to master, as the export-range functionality becomes available then.

After many days running fine, on the day next to that of the network upgrade (2023-03-14), performing archival snapshots started failing with unretrievable blocks (that hasn't happened in the history of the chain).

  | ERROR: exporting chain range: writing object to car, bs.Get: ipld: could not find bafy2bzacecdlvh4udaxpx3motumzj46nms32atnmuxuhusfxcaqzazdrskoso
-- | --

The export process traverses and tries to save all blocks and every DagCBOR Cid referenced from those block.Messages, block.ParentMessageReceipts and blocks.ParentStateRoots (when the blocks are among the specified heights).

Apparently bafy2bzacecdlvh4udaxpx3motumzj46nms32atnmuxuhusfxcaqzazdrskoso that is nowhere to be found in the Lotus datastore, despite being referenced from somewhere.

Can someone shed some light as to what that might be?

Logging Information

Above
hsanjuan commented 1 year ago

It seems that the CID is part of receipts hanging from: bafy2bzacecjkbtzobezlbh2lzqgqlrmxnsh2qqbg6qjy47pl2ybfj7s2desre

hsanjuan commented 1 year ago
$ lotus chain getblock     bafy2bzacecjkbtzobezlbh2lzqgqlrmxnsh2qqbg6qjy47pl2ybfj7s2desre
...
  "ParentReceipts": [
...
    {
      "ExitCode": 0,
      "Return": "QA==",
      "GasUsed": 20218807,
      "EventsRoot": {
        "/": "bafy2bzacecdlvh4udaxpx3motumzj46nms32atnmuxuhusfxcaqzazdrskoso"
      }
    },

What is EventsRoot ?

Stebalien commented 1 year ago

Yes. We should be filtering those from the snapshot. I'll take a look.

Stebalien commented 1 year ago

Ah, I think the difference here is that you're including receipts. Yeah, I'll add another flag there for including events.

hsanjuan commented 1 year ago

Blocks are not decoded though. Is there a way that Lotus stores events so that these links are resolvable? It sounds like events would be a thing to "archive".

Stebalien commented 1 year ago

Is there a way that Lotus stores events so that these links are resolvable? It sounds like events would be a thing to "archive".

We don't by default because we don't charge for storing them (unlike receipts).

Yeah, I'll add another flag there for including events.

I made an attempt but it required manually decoding the receipts AMT which turned out to be kind of annoying and likely a performance issue. I'd prefer to just say "exporting receipts requires storing events for now.

Stebalien commented 1 year ago

Hm. Yeah, it would require re-architecting the export system.

hsanjuan commented 1 year ago

I enabled the ETHRPC API and the problematic CID is now there (block on height 2684164.

root@filcryo:~# lotus chain  read-obj bafy2bzacecdlvh4udaxpx3motumzj46nms32atnmuxuhusfxcaqzazdrskoso
840500018344010000008081821a001f806684840362743118555820ddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef840362743218555820000000000000000000000000000000000000000000000000000000000000000084036274331855582000000000000000000000000049ea66943431c59e57b0e15c40080a34d76593218403627434185558200000000000000000000000000000000000000000000000000000000000000285

However there are others a few epochs later that are not, and things keep failing (height 2685714):

$ lotus chain  getblock bafy2bzacec2yagsbtnmq4mvt5rvvtx6sqing6somajskofkp4xc5ifrdouzng
...
    {
      "ExitCode": 0,
      "Return": "WIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADeC2s6dkAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD5a1w==",
      "GasUsed": 68158461,
      "EventsRoot": {
        "/": "bafy2bzacebooa7l2gvwsgi26ban7xxhnx7rvjkqzzfvokowifvrbwhjpybw4c"
      }
    },
...

$ root@filcryo:~# lotus chain  read-obj bafy2bzacebooa7l2gvwsgi26ban7xxhnx7rvjkqzzfvokowifvrbwhjpybw4c
ERROR: blockstore get: ipld: could not find bafy2bzacebooa7l2gvwsgi26ban7xxhnx7rvjkqzzfvokowifvrbwhjpybw4c

Why?