froghub-io / filecoin-sealer-recover

Filecoin sector recover
https://www.froghub.io
Apache License 2.0
67 stars 81 forks source link

Sector (No.) , running PreCommit2 error: sealed cid mismatching!!! #16

Open Shekelme opened 2 years ago

Shekelme commented 2 years ago

The following error occurs on some recovered sectors:

2021-12-25T12:02:35.621 INFO storage_proofs_core::data > dropping data /media/sn750/recover-57853956948449/sealed/s-t01222595-5785 2021-12-25T12:02:38.745 INFO filecoin_proofs::api::seal > seal_pre_commit_phase2:finish 2021-12-25T12:02:38.745 INFO filcrypto::proofs::api > seal_pre_commit_phase2: finish INFO[2021-12-25T12:02:38+03:00] Complete PreCommit2, sector ({1222595 5785}) ERRO[2021-12-25T12:02:38+03:00] Sector (5785) , running PreCommit2 error: sealed cid mismatching!!! (sealedCID: bagboea4b5abcb42beroeboobypwj2xvsg2ryc5xttathfol3tre3cioidvpjyhb5, newSealedCID: bagboea4b5abcbtgdq542zczhlmvif7ca5brg374gg5oo5hxfbc6xyjgiuazrpuao) INFO[2021-12-25T12:05:20+03:00] Complete sector (5785)

What causes it and how to fix it?

dayou5168 commented 2 years ago

Interesting. never saw this before. @FroghubMan can you help here.

FroghubMan commented 2 years ago

We also found the same problem. About 1% of sectors cannot be recovered correctly. The problem has not been identified, but it is suspected that the problem may occur when the first seal produces wrong results (small probability event). The feedback from #5 and #8 is the same.

Shekelme commented 2 years ago

I am trying to do a re-recovery for such sectors, but it does not help in all cases.

dayou5168 commented 2 years ago

@Shekelme Can you provide your miner id and sector number? maybe we can try one recovery test.

Shekelme commented 2 years ago

The numbers are all in OP ) Miner 1222595 , sector 5785. But also 7370, 13197 for example.

dayou5168 commented 2 years ago

Looks great.

FroghubMan commented 2 years ago

If a small number of sectors cannot be recovered after repeated attempts, it is recommended to terminate as soon as possible.

FroghubMan commented 2 years ago

I am very curious, what kind of zfs failure caused sector data loss?

FroghubMan commented 2 years ago

The numbers are all in OP ) Miner 1222595 , sector 5785. But also 7370, 13197 for example.

In recent days, my worker machines have been very busy. There may be no way to help you.

Shekelme commented 2 years ago

And a fresh one: 6028 For ZFS failure: link

s1mple1122 commented 2 years ago

I found that the probability of being unable to recover is greater than 1%. I have tested many sectors and it seems that they are basically unable to recover. Now I still can't find the reason. I even modified the code myself, cancelled nodeapi and manually passed in ticket. In either case, the recovered CID is incorrect

dayou5168 commented 2 years ago

@s1mple1122 you should check your chain data source if you have a larger portion of sectors that can't recover. maybe try to use a full node