filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.82k stars 1.25k forks source link

[Proving Issue] WindowPost failed for single deadline #6533

Open Shekelme opened 3 years ago

Shekelme commented 3 years ago

Tonight one of our partitions failed to pass the WindowPoSt.

Miner: f0187709
deadline  partitions  sectors (faults)  proven partitions
0         1           2349 (0)          0
1         1           2349 (0)          0
2         1           2349 (0)          0
3         1           2349 (0)          0
4         1           2349 (2345)       0
5         1           2349 (0)          0
6         1           2349 (0)          0
...

Processing time was

2021-06-19T23:46:54.584+0300 INFO storageminer storage/wdpost_run.go:600 computing window post {"batch": 0, "elapsed": 1862.536490038}

There was no any unusual activity on storage pools at that moment, no scrubs running etc.

Some info from miner log (not sure if it is related to this problem though):

2021-06-19T23:15:52.048+0300    INFO    storageminer    storage/wdpost_run.go:584       running window post     {"chain-random": "mBZUrEumH8IW+4PofIzICg8KSqPv4xQN/WbHjMywE4A=", "deadline": {"CurrentEpoch":860894,"PeriodStart":860664,"Index":4,"Open":860904,"Close":860964,"Challenge":860884,"FaultCutoff":860834,"WPoStPeriodDeadlines":48,"WPoStProvingPeriod":2880,"WPoStChallengeWindow":60,"WPoStChallengeLookback":20,"FaultDeclarationCutoff":70}, "height": "860894", "skipped": 0}
2021-06-19T23:18:34.476+0300    WARN    vm      vm/runtime.go:332       Abortf: failed to process post submission for deadline 20: partition already proven: {{[84] false} map[] map[]}
2021-06-19T23:18:34.477+0300    WARN    vm      vm/runtime.go:145       VM.Call failure: failed to process post submission for deadline 20: partition already proven: {{[84] false} map[] map[]} (RetCode=16):
2021-06-19T23:18:34.477+0300    WARN    vm      vm/vm.go:532    Send actor error        {"from": "f3wscha6fiz6v7clhe3fshqcumjig6mxwitjsf7baxygepsvr5ohpycob4dvadumlw4wr6gbmoks5yhn2chiea", "to": "f0442371", "nonce": 3596, "method": "5", "height": "860917", "error": "failed to process post submission for deadline 20: partition already proven: {{[84] false} map[] map[]} (RetCode=16):\n    github.com/filecoin-project/specs-actors/v4/actors/builtin.RequireNoErr\n        /home/admfc/go/pkg/mod/github.com/filecoin-project/specs-actors/v4@v4.0.0/actors/builtin/shared.go:75"}

1hr log: https://disk.yandex.ru/d/4OmaoUJDLjlsBQ

Version

lotus-miner version 1.9.0+mainnet+git.ada7f97ba

Setup

Ryzen 9 3950X, 128 GB RAM (DDR4-3200), RTX3090. NAS is attached via 2*10Gbit ethernet links. This is our first time encountering this problem. But they say that it happens from time to time, and it is impossible to understand the reason from the logs.

Proving status

lotus-miner proving info
Miner: f0187709
Current Epoch:           862367
Proving Period Boundary: 2424
Proving Period Start:    860664 (14 hours 11 minutes ago)
Next Period Start:       863544 (in 9 hours 48 minutes)

Faults:      2347 (5.48%)
Recovering:  0
Deadline Index:       28
Deadline Sectors:     0
Deadline Open:        862344 (11 minutes 30 seconds ago)
Deadline Close:       862404 (in 18 minutes 30 seconds)
Deadline Challenge:   862324 (21 minutes 30 seconds ago)
Deadline FaultCutoff: 862274 (46 minutes 30 seconds ago)

Lotus miner diagnostic info

Please collect the following diagnostic information, and share a link here

Shekelme commented 3 years ago

Fixed URL for allinfo data.

Shekelme commented 3 years ago

2 new deadlines (5 and 10) failed tonight.