filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.83k stars 1.25k forks source link

proof validation failed, sector not found in sector set after cron #8694

Closed beck-8 closed 2 years ago

beck-8 commented 2 years ago

Checklist

Lotus component

Lotus Version

Daemon:  1.15.2+mainnet+git.518dc962e+api1.5.0
Local: lotus-miner version 1.15.2+mainnet+git.518dc962e

Describe the Bug

After this happens, the log keeps reporting errors. This C2 prove message is in a ok state on the chain,but use lotus state sector xx xx can't find sector information. I have tried manually resubmitting the C2 message, or changing the sector status and letting it resubmit the C2 message by itself, but it doesn't work.This method can solve this problem before v1.14.1
The way to reproduce this problem, I will write down below.But I think my problem this time is not caused by this problem.

I think the solution to this problem is to find out why the on-chain message is executed successfully, but the sector status is not updated?

old issues Don't close issues until the issue is resolved !!!

Logging Information

The message ID here was changed by me for privacy, so I can't find it
34.     2022-05-21 18:05:36 +0800 CST:  [event;sealing.SectorSubmitCommitAggregate]       {"User":{}}                                35.     2022-05-21 18:26:20 +0800 CST:  [event;sealing.SectorCommitAggregateSent] {"User":{"Message":{"/":"bafy2bzacecknj3d3adws37fon
cmjc7fpbkx2g4dzrhqraodqnsgurfivz"}}}                                                                                             36.     2022-05-21 18:29:30 +0800 CST:  [event;sealing.SectorCommitFailed]        {"User":{}}                                                proof validation failed, sector not found in sector set after cron                                                           37.     2022-05-21 18:30:30 +0800 CST:  [event;sealing.SectorRetryCommitWait]     {"User":{}}                                        
38.     2022-05-21 18:30:30 +0800 CST:  [event;sealing.SectorCommitFailed]        {"User":{}}                                                proof validation failed, sector not found in sector set after cron                                                           
39.     2022-05-21 18:31:30 +0800 CST:  [event;sealing.SectorRetryCommitWait]     {"User":{}}                                        40.     2022-05-21 18:31:30 +0800 CST:  [event;sealing.SectorCommitFailed]        {"User":{}}                                                proof validation failed, sector not found in sector set after cron                                                           41.     2022-05-21 18:32:30 +0800 CST:  [event;sealing.SectorRetryCommitWait]     {"User":{}}                                        
42.     2022-05-21 18:32:30 +0800 CST:  [event;sealing.SectorCommitFailed]        {"User":{}}                                        
        proof validation failed, sector not found in sector set after cron                                                           
43.     2022-05-21 18:33:30 +0800 CST:  [event;sealing.SectorRetryCommitWait]     {"User":{}}                                        
44.     2022-05-21 18:33:30 +0800 CST:  [event;sealing.SectorCommitFailed]        {"User":{}}                                        
        proof validation failed, sector not found in sector set after cron                                                           
45.     2022-05-21 18:34:30 +0800 CST:  [event;sealing.SectorRetryCommitWait]     {"User":{}}                                        
46.     2022-05-21 18:34:30 +0800 CST:  [event;sealing.SectorCommitFailed]        {"User":{}}                                        
        proof validation failed, sector not found in sector set after cron                                                           
47.     2022-05-21 18:35:30 +0800 CST:  [event;sealing.SectorRetryCommitWait]     {"User":{}}
48.     2022-05-21 18:35:30 +0800 CST:  [event;sealing.SectorCommitFailed]        {"User":{}}

Repo Steps

  1. Make a lot of sector pledge and enable miner CollateralFromMinerBalance and set AvailableBalanceBuffer
  2. The running sector pledge to a state where C2 messages are continuously sent
  3. Withdraw all available Miner balance
  4. This problem occurs in some sectors,See error 'proof validation failed...set after cron '
TippyFlitsUK commented 2 years ago

Hi @beck-8

Thanks for the report. Looks like you are running Lotus with custom code or adjustment .dirty

Please upgrade to stock lotus, make clean all ! If the issue persist - leave a comment on here with new logs and repro steps.

Thank you !

beck-8 commented 2 years ago

dirtyis as expected, no problem This problem still exists, please follow up, thank you

TippyFlitsUK commented 2 years ago

Please revert to a stock build @beck-8 and add to this ticket if you still see issues.

Many thanks!

beck-8 commented 2 years ago
Daemon:  1.15.2+mainnet+git.518dc962e+api1.5.0
Local: lotus-miner version 1.15.2+mainnet+git.518dc962e

@TippyFlitsUK It has been restored, but there is still a problem, please track the problem

TippyFlitsUK commented 2 years ago

Many thanks @beck-8

Please let us know if this issue recurs sealing new sectors with the stock build. 🙏

beck-8 commented 2 years ago

sealing new sectors with the stock build

sorry,I don't understand what you mean @TippyFlitsUK

beck-8 commented 2 years ago

It has been restored, but there is still a problem, please track the problem

yes,there is still a problem

TippyFlitsUK commented 2 years ago

We need to determine if this problem is still present using the stock Lotus build.

Now that you are running 1.15.2+mainnet+git.518dc962e please start sealing some new sectors and let us know if you still see the issue.

beck-8 commented 2 years ago

I ran the test in another environment,there is still a problem this env is 1.15.2+mainnet+git.518dc962e

Please believe me, please don't waste time repeatedly confirming information.

@TippyFlitsUK

beck-8 commented 2 years ago

I think the solution to this problem is to find out why the on-chain message is executed successfully, but the sector status is not updated? Please pay attention !!!

TippyFlitsUK commented 2 years ago

I very much doubt you have managed to seal a new sector in the 30 minutes since you reverted to stock @beck-8!!

I am asking you to perform one simple task in order to help us get to the root of this issue. If you are unwilling or unable to complete this task, I will simply close this ticket.

I am taking time on my weekend to try and help you with this issue. I also value my time!!

beck-8 commented 2 years ago

no no no ,i don't seal a new sector ,my approach was to change the environment I'm using to the official version for testing as I have a large number of sectors submitting C2 Sorry, I didn't express my meaning clearly

beck-8 commented 2 years ago

See if this method can meet your requirements?

TippyFlitsUK commented 2 years ago

So to clarify, you are seeing this issue on 2 completely separate miners with individual miner ID's. One is running dirty and the other is running stock?

beck-8 commented 2 years ago

yes

TippyFlitsUK commented 2 years ago

Thank you for the clarification @beck-8!

Could you please provide the miner ID's of the 2 servers? We will need to examine the on-chain data.

It would also be very helpful if you could please provide the miner and sector logs of a SectorCommitFailed sector from the server that is running stock Lotus.

Many thanks!

beck-8 commented 2 years ago

One of them can be provided, but not public, is there any other channel to send you a private message with details

TippyFlitsUK commented 2 years ago

Sure, you can DM them to me on Slack at @TippyFlits.

Is there a specific reason you are unable to provide both ID's? It would greatly help our investigations if we were able to compare.

beck-8 commented 2 years ago

I DM you slack because the boss won't let me,hhhhh

Derek-zd commented 2 years ago

I have the same problem,and I see other people have the same problem.

TippyFlitsUK commented 2 years ago

Hey @Derek-zd

Are you able to provide any additional information with regards to the error you are seeing? What version of Lotus are you running?

Many thanks!!

beck-8 commented 2 years ago

I found the reason from the daemon, the submitted proof has not been verified

{"level":"warn","ts":"2022-05-21T21:12:04.384+0800","logger":"vm","caller":"vm/syscalls.go:342","msg":"seal verify in batch failed","miner":"xx","sectorNumber":"xx","err":"invalidproof"}
{"level":"info","ts":"2022-05-21T21:12:04.513+0800","logger":"actors","caller":"vm/runtime.go:624","msg":"a proof failed from miner xx"}