filecoin-project / lily

capturing on-chain state for the filecoin network
Other
49 stars 45 forks source link

Error getting messages from execution trace #1093

Closed davidgasquez closed 1 year ago

davidgasquez commented 1 year ago

The tasks internal_parsed_messages, internal_messages, and vm_messages are returning some errors when running lily job walk on some archival snapshots.

Steps to Reproduce:

  1. Obtain an Archival Snapshot: aws s3 cp "s3://sentinel-backfill/historical-exports/snapshot_40320_43202_1666948032.car.zst" .
  2. Extract it: unzstd snapshot_40320_43202_1666948032.car.zst -o /tmp/snapshot.car
  3. Initialize Lily: lily init --config=config.toml --import-snapshot /tmp/snapshot.car
  4. Start Lily Daemon: nohup lily daemon --config=config.toml --bootstrap=false &> lily.log &
  5. Run a walk that covers from 41280 to 41285] : lily job run --storage=CSV walk --from 41280 --to 41285
  6. Check Visor Processing Reports: cat visor_processing_reports.csv | grep ERROR

This is the config.toml file:

[Storage]
    [Storage.File]
        [Storage.File.CSV]
            Format = "CSV"
            Path = "/tmp/data"
            OmitHeader = false
            FilePattern = "{table}.csv"

The visor_processing_reports.csv has the following error:

41282,bafy2bzaceajkwiwjw5hdwgp73u7f4nl6ayihxaxashcmtaqgxamww6xvufhpi,walk_1668676799,internal_parsed_messages,2022-11-17T09:20:12.251Z,2022-11-17T09:20:12.258Z,ERROR,,"{""Error"":""getting messages executions for tipset: failed to compute execution trace for tipset {bafy2bzacecn46qn2rsxx5frvaofn2qg4nuykxwnslzhhp6zgfl4gz5cs5x6l2}: error handling state forks: loading state tree failed: load state tree: failed to load state tree bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli: failed to load hamt node: ipld: could not find bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli""}"
41282,bafy2bzaceajkwiwjw5hdwgp73u7f4nl6ayihxaxashcmtaqgxamww6xvufhpi,walk_1668676799,vm_messages,2022-11-17T09:20:12.251Z,2022-11-17T09:20:12.31Z,ERROR,,"{""Error"":""getting messages executions for tipset: failed to compute execution trace for tipset {bafy2bzacecn46qn2rsxx5frvaofn2qg4nuykxwnslzhhp6zgfl4gz5cs5x6l2}: error handling state forks: loading state tree failed: load state tree: failed to load state tree bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli: failed to load hamt node: ipld: could not find bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli""}"
41282,bafy2bzaceajkwiwjw5hdwgp73u7f4nl6ayihxaxashcmtaqgxamww6xvufhpi,walk_1668676799,internal_messages,2022-11-17T09:20:12.251Z,2022-11-17T09:20:12.258Z,ERROR,,"{""Error"":""getting messages executions for tipset: failed to compute execution trace for tipset {bafy2bzacecn46qn2rsxx5frvaofn2qg4nuykxwnslzhhp6zgfl4gz5cs5x6l2}: error handling state forks: loading state tree failed: load state tree: failed to load state tree bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli: failed to load hamt node: ipld: could not find bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli""}"

And the lily.log file has these lines:

{"level":"debug","ts":"2022-11-17T09:20:12.258Z","logger":"lily/integrated/tipset","caller":"tipset/tipset.go:182","msg":"task report","height":"41282","task":"internal_parsed_messages","reporter":"walk_1668676799","status":"ERROR","duration":0.007188872}
{"level":"debug","ts":"2022-11-17T09:20:12.259Z","logger":"lily/integrated/tipset","caller":"tipset/tipset.go:182","msg":"task report","height":"41282","task":"internal_messages","reporter":"walk_1668676799","status":"ERROR","duration":0.007197542}
{"level":"warn","ts":"2022-11-17T09:20:12.259Z","logger":"lily/index/manager","caller":"integrated/manager.go:122","msg":"task failed","height":"41282","reporter":"walk_1668676799","task":"internal_parsed_messages","status":"ERROR","errors":{"Error":"getting messages executions for tipset: failed to compute execution trace for tipset {bafy2bzacecn46qn2rsxx5frvaofn2qg4nuykxwnslzhhp6zgfl4gz5cs5x6l2}: error handling state forks: loading state tree failed: load state tree: failed to load state tree bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli: failed to load hamt node: ipld: could not find bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli"},"info":""}
{"level":"warn","ts":"2022-11-17T09:20:12.259Z","logger":"lily/index/manager","caller":"integrated/manager.go:122","msg":"task failed","height":"41282","reporter":"walk_1668676799","task":"internal_messages","status":"ERROR","errors":{"Error":"getting messages executions for tipset: failed to compute execution trace for tipset {bafy2bzacecn46qn2rsxx5frvaofn2qg4nuykxwnslzhhp6zgfl4gz5cs5x6l2}: error handling state forks: loading state tree failed: load state tree: failed to load state tree bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli: failed to load hamt node: ipld: could not find bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli"},"info":""}
{"level":"debug","ts":"2022-11-17T09:20:12.310Z","logger":"lily/integrated/tipset","caller":"tipset/tipset.go:182","msg":"task report","height":"41282","task":"vm_messages","reporter":"walk_1668676799","status":"ERROR","duration":0.059018215}
{"level":"warn","ts":"2022-11-17T09:20:12.310Z","logger":"lily/index/manager","caller":"integrated/manager.go:122","msg":"task failed","height":"41282","reporter":"walk_1668676799","task":"vm_messages","status":"ERROR","errors":{"Error":"getting messages executions for tipset: failed to compute execution trace for tipset {bafy2bzacecn46qn2rsxx5frvaofn2qg4nuykxwnslzhhp6zgfl4gz5cs5x6l2}: error handling state forks: loading state tree failed: load state tree: failed to load state tree bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli: failed to load hamt node: ipld: could not find bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli"},"info":""}

Lily Version: v0.12.0+6-g8d3c4b7

frrist commented 1 year ago

Humm interesting, these epochs are around an upgrade boundary: https://github.com/filecoin-project/lotus/blob/master/build/params_mainnet.go#L33

I am able to compute state for all epochs around said boundary with the exception of the one following, 41281; these tasks are failing to read the stateroot at said heigh:

frrist@oak ~/W/s/g/f/lily (master) > ./lily chain state-compute -e=41280
frrist@oak ~/W/s/g/f/lily (master)> ./lily chain state-compute -e=41281
error handling state forks: loading state tree failed: load state tree: failed to load state tree bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli: failed to load hamt node: ipld: could not find bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli
frrist@oak ~/W/s/g/f/lily (master)> ./lily chain state-compute -e=41282

will continue to share context as I debug.

frrist commented 1 year ago

As expected (due to this being an upgrade boundary) there is a migration that needs to be run, and its failing:

2022-11-29T13:06:43.805-0800    WARN    statemgr        stmgr/forks.go:176      STARTING migration      {"height": "41280", "from": "bafy2bzaceb67cfbcdy6xzljkk62qvqcrnwomfp2til7fd5bjzlqvk5n27in5o"}
2022-11-29T13:06:43.805-0800    ERROR   statetree       state/statetree.go:283  failed to load state tree: failed to load hamt node: ipld: could not find bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli
2022-11-29T13:06:43.805-0800    ERROR   statemgr        stmgr/forks.go:182      FAILED migration        {"height": "41280", "from": "bafy2bzaceb67cfbcdy6xzljkk62qvqcrnwomfp2til7fd5bjzlqvk5n27in5o", "error": "loading state tree failed: load state tree: failed to load state tree bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli: failed to load hamt node: ipld: could not find bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli", "errorVerbose": "loading state tree failed:\n    github.com/filecoin-project/lotus/chain/consensus/filcns.UpgradeFaucetBurnRecovery\n        /Users/frrist/Workspace/pkg/mod/github.com/filecoin-project/lotus@v1.18.0/chain/consensus/filcns/upgrades.go:255\n  - load state tree:\n    github.com/filecoin-project/lotus/chain/stmgr.(*StateManager).ParentState\n        /Users/frrist/Workspace/pkg/mod/github.com/filecoin-project/lotus@v1.18.0/chain/stmgr/read.go:28\n  - failed to load state tree bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli:\n    github.com/filecoin-project/lotus/chain/state.LoadStateTree\n        /Users/frrist/Workspace/pkg/mod/github.com/filecoin-project/lotus@v1.18.0/chain/state/statetree.go:284\n  - failed to load hamt node:\n    github.com/filecoin-project/specs-actors/actors/util/adt.AsMap\n        /Users/frrist/Workspace/pkg/mod/github.com/filecoin-project/specs-actors@v0.9.15/actors/util/adt/map.go:41\n  - ipld: could not find bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli"}
2022-11-29T13:06:43.805-0800    WARN    rpc     go-jsonrpc@v0.1.8/handler.go:329        error in RPC call to 'Filecoin.StateCompute': error handling state forks:
    github.com/filecoin-project/lotus/chain/consensus/filcns.(*TipSetExecutor).ApplyBlocks
        /Users/frrist/Workspace/pkg/mod/github.com/filecoin-project/lotus@v1.18.0/chain/consensus/filcns/compute_state.go:161
  - loading state tree failed:
    github.com/filecoin-project/lotus/chain/consensus/filcns.UpgradeFaucetBurnRecovery
        /Users/frrist/Workspace/pkg/mod/github.com/filecoin-project/lotus@v1.18.0/chain/consensus/filcns/upgrades.go:255
  - load state tree:
    github.com/filecoin-project/lotus/chain/stmgr.(*StateManager).ParentState
        /Users/frrist/Workspace/pkg/mod/github.com/filecoin-project/lotus@v1.18.0/chain/stmgr/read.go:28
  - failed to load state tree bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli:
    github.com/filecoin-project/lotus/chain/state.LoadStateTree
        /Users/frrist/Workspace/pkg/mod/github.com/filecoin-project/lotus@v1.18.0/chain/state/statetree.go:284
  - failed to load hamt node:
    github.com/filecoin-project/specs-actors/actors/util/adt.AsMap
        /Users/frrist/Workspace/pkg/mod/github.com/filecoin-project/specs-actors@v0.9.15/actors/util/adt/map.go:41
  - ipld: could not find bafy2bzacecnw4ml2ylcb7bqyjqvfeearfuzj5vdg6ayc3uzzi3sppmnsz5lli

This leads me to believe this relates to our version/dependency on lotus; I wonder if it needs to look back further than we have the state for.

frrist commented 1 year ago

Okay found the CID we are failing to fetch: https://filscan.io/tipset/chain?hash=bafy2bzacecug765yoalzwmmf4hml6wlusy7bah7abcgw3lenosgoo4jutlz2q (see Parent Stateroot field), it's in block at height 32000. A height this export lacks, unfortunately.

frrist commented 1 year ago

The lookback epoch for the migration is out of range for this snapshot: https://github.com/filecoin-project/lotus/blob/master/chain/consensus/filcns/upgrades.go#L227

davidgasquez commented 1 year ago

Thanks for taking a look! So, that means there is no way to export 41282 message related tasks with the current snapshots. If we want to do that we should have a .repo that covers from 32000 to >41282, right?

frrist commented 1 year ago

Correct. I don't think it would be too hard to write some code that concatenates car files together, and I suspect we might hit issues like this again around other upgrade epochs, but not sure. Having such a tool ready ahead of time could be beneficial to mitigating these problems in the future.

davidgasquez commented 1 year ago

Update: @hsanjuan worked on a tool to cat CARv1 files.

https://github.com/hsanjuan/carcat

Closing this one as the tool should do the trick for now.