filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.83k stars 1.25k forks source link

v.1.18.0 - Duplicate Storage Paths in `storage.json` lead to Miner Crash on `lotus-miner run` #9739

Open TippyFlitsUK opened 1 year ago

TippyFlitsUK commented 1 year ago

Checklist

Lotus component

Lotus Version

1.18.0+mainnet+git.bd10bdf99

Describe the Bug

We are seeing a few reports from SPs regarding miner crashes following upgrade from version 1.16.X or 1.17.X to v.1.18.0.

SPs attempting to start their miner process following upgrade to v1.18.0 encounter the following error message. The miner crashes entirely and no subsequent logs are output:

2022-11-19T13:44:39.818-0800    INFO    paramfetch      go-paramfetch@v0.0.4/paramfetch.go:233  parameter and key-fetching complete
2022-11-19T13:44:40.024-0800    INFO    stores  paths/index.go:181      New sector storage: 3264ded3-7cac-4ac7-abbb-55a6967bd5ee

The immediate issue can be swiftly resolved by removing duplicate storage paths in the miner/worker storage.json files.

It is not clear how the duplicates happened in the first place. Lotus does not edit these files directly. It is most likely to be a simple manual user config error.

The additional check was added in https://github.com/filecoin-project/lotus/pull/9032 as it is now possible to manipulate paths at runtime in many more new ways.

The error messaging following the event is not functioning as expected. SPs are sharing logs that appear to be truncated.

Link to Slack thread

Logging Information

2022-11-19T13:44:39.818-0800    INFO    paramfetch      go-paramfetch@v0.0.4/paramfetch.go:233  parameter and key-fetching complete
2022-11-19T13:44:40.024-0800    INFO    stores  paths/index.go:181      New sector storage: 3264ded3-7cac-4ac7-abbb-55a6967bd5ee

Repo Steps

  1. Run '...'
  2. Do '...'
  3. See error '...' ...
lbj2004032 commented 1 year ago

I don't have a duplicate path, but I can't start

What should I do? I have changed V1.18.1 and still cannot start

lbj2004032 commented 1 year ago

This issue has not been resolved yet

Many nodes are stuck in the "New sector" log line

When I modify storage.json and change the Path to a new value (/mnt/lotus/mainData/), this Path is executed by re executing the lotus miner storage attach -- init -- store/mnt/lotus/mainData/, which can be started successfully

Next, I execute the lotus miner attach/mnt/netdisk/mainData command. The program is stuck, and the path/mnt/netdisk/mainData is the old value.

"I have to start using 1.17.0 first. If it succeeds, it must be started using 1.17.0. If it is 1.17.2, the program still cannot be started."