Open rwxr-xr-x opened 1 year ago
Hi @mrworkdev
I think this will solve the issue: https://lotus.filecoin.io/kb/sector-removal/#step-3-create-dummy-sector-data
Didn't help (as expected). Sector data was not lost and miner can delete it... But even with dummy fake sector it doesn't want. Sectors log didn't changed. Maybe waiting for something to trigger it.
Still something like this (after few days of tests, removals, terminations, miner restarts):
SectorID: 239790
Status: PreCommitWait
CIDcommD: baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq
CIDcommR: bagboea4b5abca3bszluafr5p4wke3kudt33wyakalkekoe6cyrua6q4ak5kicbth
Ticket: 9f21014387d729b00018e70e1a2f75cf3e7001a97a0bc188a4e0ccb2e7f6fc24
TicketH: 2465726
Seed:
SeedH: 0
Precommit: bafy2bzacecrgb4cl2ostu37qqe7c6xyfmipohoxil765v7hgnwg5w7da6w6o6
Commit: <nil>
Deals: [0]
Retries: 0
--------
Event Log:
0. 2022-12-29 09:29:42 +0000 UTC: [event;sealing.SectorStartCC] {"User":{"ID":239790,"SectorType":8}}
1. 2022-12-29 09:30:59 +0000 UTC: [event;sealing.SectorPacked] {"User":{"FillerPieces":[{"Size":34359738368,"PieceCID":{"/":"baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq"}}]}}
2. 2022-12-29 09:30:59 +0000 UTC: [event;sealing.SectorTicket] {"User":{"TicketValue":"nyEBQ4fXKbAAGOcOGi91zz5wAal6C8GIpODMsuf2/CQ=","TicketEpoch":2465726}}
3. 2022-12-29 14:32:54 +0000 UTC: [event;sealing.SectorPreCommit1] {"User":{"PreCommit1Out":"eyJfbG90dXNfU2VhbFJhbmRvbW5lc3MiOiJueUVCUTRmWEtiQUFHT2NPR2k5MXp6NXdBYWw2QzhHSXBPRE1zdWYyL0NRPSIsImNvbW1fZCI6WzcsMTI2LDk1LDIyMiw1MywxOTcsMTAsMTQ3LDMsMTY1LDgwLDksMjI3LDczLDEzOCw3OCwxOTAsMjIzLDI0MywxNTYsNjYsMTgzLDE2LDE4Myw0OCwyMTYsMjM2LDEyMiwxOTksMTc1LDE2Niw2Ml0sImNvbmZpZyI6eyJpZCI6InRyZWUtZCIsInBhdGgiOiIvRklMRUNPSU4vU0VBTC9OVU1BMC9jYWNoZS9zLXQwMTY4MTgwOC0yMzk3OTAiLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjIxNDc0ODM2NDd9LCJsYWJlbHMiOnsiU3RhY2tlZERyZzMyR2lCVjEiOnsiX2giOm51bGwsImxhYmVscyI6W3siaWQiOiJsYXllci0xIiwicGF0aCI6Ii9GSUxFQ09JTi9TRUFML05VTUEwL2NhY2hlL3MtdDAxNjgxODA4LTIzOTc5MCIsInJvd3NfdG9fZGlzY2FyZCI6Nywic2l6ZSI6MTA3Mzc0MTgyNH0seyJpZCI6ImxheWVyLTIiLCJwYXRoIjoiL0ZJTEVDT0lOL1NFQUwvTlVNQTAvY2FjaGUvcy10MDE2ODE4MDgtMjM5NzkwIiwicm93c190b19kaXNjYXJkIjo3LCJzaXplIjoxMDczNzQxODI0fSx7ImlkIjoibGF5ZXItMyIsInBhdGgiOiIvRklMRUNPSU4vU0VBTC9OVU1BMC9jYWNoZS9zLXQwMTY4MTgwOC0yMzk3OTAiLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjEwNzM3NDE4MjR9LHsiaWQiOiJsYXllci00IiwicGF0aCI6Ii9GSUxFQ09JTi9TRUFML05VTUEwL2NhY2hlL3MtdDAxNjgxODA4LTIzOTc5MCIsInJvd3NfdG9fZGlzY2FyZCI6Nywic2l6ZSI6MTA3Mzc0MTgyNH0seyJpZCI6ImxheWVyLTUiLCJwYXRoIjoiL0ZJTEVDT0lOL1NFQUwvTlVNQTAvY2FjaGUvcy10MDE2ODE4MDgtMjM5NzkwIiwicm93c190b19kaXNjYXJkIjo3LCJzaXplIjoxMDczNzQxODI0fSx7ImlkIjoibGF5ZXItNiIsInBhdGgiOiIvRklMRUNPSU4vU0VBTC9OVU1BMC9jYWNoZS9zLXQwMTY4MTgwOC0yMzk3OTAiLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjEwNzM3NDE4MjR9LHsiaWQiOiJsYXllci03IiwicGF0aCI6Ii9GSUxFQ09JTi9TRUFML05VTUEwL2NhY2hlL3MtdDAxNjgxODA4LTIzOTc5MCIsInJvd3NfdG9fZGlzY2FyZCI6Nywic2l6ZSI6MTA3Mzc0MTgyNH0seyJpZCI6ImxheWVyLTgiLCJwYXRoIjoiL0ZJTEVDT0lOL1NFQUwvTlVNQTAvY2FjaGUvcy10MDE2ODE4MDgtMjM5NzkwIiwicm93c190b19kaXNjYXJkIjo3LCJzaXplIjoxMDczNzQxODI0fSx7ImlkIjoibGF5ZXItOSIsInBhdGgiOiIvRklMRUNPSU4vU0VBTC9OVU1BMC9jYWNoZS9zLXQwMTY4MTgwOC0yMzk3OTAiLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjEwNzM3NDE4MjR9LHsiaWQiOiJsYXllci0xMCIsInBhdGgiOiIvRklMRUNPSU4vU0VBTC9OVU1BMC9jYWNoZS9zLXQwMTY4MTgwOC0yMzk3OTAiLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjEwNzM3NDE4MjR9LHsiaWQiOiJsYXllci0xMSIsInBhdGgiOiIvRklMRUNPSU4vU0VBTC9OVU1BMC9jYWNoZS9zLXQwMTY4MTgwOC0yMzk3OTAiLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjEwNzM3NDE4MjR9XX19LCJyZWdpc3RlcmVkX3Byb29mIjoiU3RhY2tlZERyZzMyR2lCVjFfMSJ9"}}
4. 2022-12-29 14:44:53 +0000 UTC: [event;sealing.SectorPreCommit2] {"User":{"Sealed":{"/":"bagboea4b5abca3bszluafr5p4wke3kudt33wyakalkekoe6cyrua6q4ak5kicbth"},"Unsealed":{"/":"baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq"}}}
5. 2022-12-29 14:44:53 +0000 UTC: [event;sealing.SectorPreCommitted] {"User":{"Message":{"/":"bafy2bzacecrgb4cl2ostu37qqe7c6xyfmipohoxil765v7hgnwg5w7da6w6o6"},"PreCommitDeposit":"95774649064676276","PreCommitInfo":{"SealProof":8,"SectorNumber":239790,"SealedCID":{"/":"bagboea4b5abca3bszluafr5p4wke3kudt33wyakalkekoe6cyrua6q4ak5kicbth"},"SealRandEpoch":2465726,"DealIDs":[],"Expiration":4015680,"UnsealedCid":null}}}
6. 2022-12-29 17:07:23 +0000 UTC: [event;sealing.SectorChainPreCommitFailed] {"User":{}}
found message with equal nonce as the one we are looking for that is NOT a valid replacement message (F:bafy2bzacecrgb4cl2ostu37qqe7c6xyfmipohoxil765v7hgnwg5w7da6w6o6 n 481750, TS: bafy2bzacecdqzayifi2txhxpzhamvljbbzlx253jierq4uhqlqpvipkq6ecjq n481750)
7. 2022-12-29 17:08:23 +0000 UTC: [event;sealing.SectorRetryPreCommitWait] {"User":{}}
I think it is due to disk failure during sealing (where chain was located). Daemon down with all its data and was re-created from scratch asap. But looks like some messages was not sent to chain or some metadata has been lost.
That's a correct assumption.
Once a transaction is included in a block, it cannot be reversed. If the local mpool get's corrupted , your node will forget about any transactions that were in the mpool at the time, but the rest of the network will still have a record of those transactions.
This message "highjacked" the Nonce intended for the Precommit message, which led to the sectors being stuck because it tried to use the same Nonce as the other message. With the mpool clear, I don't know how to reconstruct a message.
So I think we need to remove the sectors, somehow.
In the past I have had some success creating new CC sector. During AP, I quickly copy the file and change the name of it to the stuck sector. Move the sector to another (idle) worker and restarted that - and that throws it in a failed state.
Try that, and in meantime - I will try reproduce the issue here and look for a solution to the issue. Thank you!
My AP about 40 seconds. Can you explain more details how to copy and what? How to move sectors between workers?
Anyway to make it work something have to trigger scheduler to analyze and change sectors state. But currently these sectors does not react on any actions. So it can not check failed sector on disk (
After 10+ days and some whole pipeline restarts these sectors still in same state.
$ lotus-miner sectors status --log 239790
SectorID: 239790
Status: PreCommitWait
CIDcommD: baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq
CIDcommR: bagboea4b5abca3bszluafr5p4wke3kudt33wyakalkekoe6cyrua6q4ak5kicbth
Ticket: 9f21014387d729b00018e70e1a2f75cf3e7001a97a0bc188a4e0ccb2e7f6fc24
TicketH: 2465726
Seed:
SeedH: 0
Precommit: bafy2bzacecrgb4cl2ostu37qqe7c6xyfmipohoxil765v7hgnwg5w7da6w6o6
Commit: <nil>
Deals: [0]
Retries: 0
--------
Event Log:
0. 2022-12-29 09:29:42 +0000 UTC: [event;sealing.SectorStartCC] {"User":{"ID":239790,"SectorType":8}}
1. 2022-12-29 09:30:59 +0000 UTC: [event;sealing.SectorPacked] {"User":{"FillerPieces":[{"Size":34359738368,"PieceCID":{"/":"baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq"}}]}}
2. 2022-12-29 09:30:59 +0000 UTC: [event;sealing.SectorTicket] {"User":{"TicketValue":"nyEBQ4fXKbAAGOcOGi91zz5wAal6C8GIpODMsuf2/CQ=","TicketEpoch":2465726}}
3. 2022-12-29 14:32:54 +0000 UTC: [event;sealing.SectorPreCommit1] {"User":{"PreCommit1Out":"eyJfbG90dXNfU2VhbFJhbmRvbW5lc3MiOiJueUVCUTRmWEtiQUFHT2NPR2k5MXp6NXdBYWw2QzhHSXBPRE1zdWYyL0NRPSIsImNvbW1fZCI6WzcsMTI2LDk1LDIyMiw1MywxOTcsMTAsMTQ3LDMsMTY1LDgwLDksMjI3LDczLDEzOCw3OCwxOTAsMjIzLDI0MywxNTYsNjYsMTgzLDE2LDE4Myw0OCwyMTYsMjM2LDEyMiwxOTksMTc1LDE2Niw2Ml0sImNvbmZpZyI6eyJpZCI6InRyZWUtZCIsInBhdGgiOiIvRklMRUNPSU4vU0VBTC9OVU1BMC9jYWNoZS9zLXQwMTY4MTgwOC0yMzk3OTAiLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjIxNDc0ODM2NDd9LCJsYWJlbHMiOnsiU3RhY2tlZERyZzMyR2lCVjEiOnsiX2giOm51bGwsImxhYmVscyI6W3siaWQiOiJsYXllci0xIiwicGF0aCI6Ii9GSUxFQ09JTi9TRUFML05VTUEwL2NhY2hlL3MtdDAxNjgxODA4LTIzOTc5MCIsInJvd3NfdG9fZGlzY2FyZCI6Nywic2l6ZSI6MTA3Mzc0MTgyNH0seyJpZCI6ImxheWVyLTIiLCJwYXRoIjoiL0ZJTEVDT0lOL1NFQUwvTlVNQTAvY2FjaGUvcy10MDE2ODE4MDgtMjM5NzkwIiwicm93c190b19kaXNjYXJkIjo3LCJzaXplIjoxMDczNzQxODI0fSx7ImlkIjoibGF5ZXItMyIsInBhdGgiOiIvRklMRUNPSU4vU0VBTC9OVU1BMC9jYWNoZS9zLXQwMTY4MTgwOC0yMzk3OTAiLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjEwNzM3NDE4MjR9LHsiaWQiOiJsYXllci00IiwicGF0aCI6Ii9GSUxFQ09JTi9TRUFML05VTUEwL2NhY2hlL3MtdDAxNjgxODA4LTIzOTc5MCIsInJvd3NfdG9fZGlzY2FyZCI6Nywic2l6ZSI6MTA3Mzc0MTgyNH0seyJpZCI6ImxheWVyLTUiLCJwYXRoIjoiL0ZJTEVDT0lOL1NFQUwvTlVNQTAvY2FjaGUvcy10MDE2ODE4MDgtMjM5NzkwIiwicm93c190b19kaXNjYXJkIjo3LCJzaXplIjoxMDczNzQxODI0fSx7ImlkIjoibGF5ZXItNiIsInBhdGgiOiIvRklMRUNPSU4vU0VBTC9OVU1BMC9jYWNoZS9zLXQwMTY4MTgwOC0yMzk3OTAiLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjEwNzM3NDE4MjR9LHsiaWQiOiJsYXllci03IiwicGF0aCI6Ii9GSUxFQ09JTi9TRUFML05VTUEwL2NhY2hlL3MtdDAxNjgxODA4LTIzOTc5MCIsInJvd3NfdG9fZGlzY2FyZCI6Nywic2l6ZSI6MTA3Mzc0MTgyNH0seyJpZCI6ImxheWVyLTgiLCJwYXRoIjoiL0ZJTEVDT0lOL1NFQUwvTlVNQTAvY2FjaGUvcy10MDE2ODE4MDgtMjM5NzkwIiwicm93c190b19kaXNjYXJkIjo3LCJzaXplIjoxMDczNzQxODI0fSx7ImlkIjoibGF5ZXItOSIsInBhdGgiOiIvRklMRUNPSU4vU0VBTC9OVU1BMC9jYWNoZS9zLXQwMTY4MTgwOC0yMzk3OTAiLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjEwNzM3NDE4MjR9LHsiaWQiOiJsYXllci0xMCIsInBhdGgiOiIvRklMRUNPSU4vU0VBTC9OVU1BMC9jYWNoZS9zLXQwMTY4MTgwOC0yMzk3OTAiLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjEwNzM3NDE4MjR9LHsiaWQiOiJsYXllci0xMSIsInBhdGgiOiIvRklMRUNPSU4vU0VBTC9OVU1BMC9jYWNoZS9zLXQwMTY4MTgwOC0yMzk3OTAiLCJyb3dzX3RvX2Rpc2NhcmQiOjcsInNpemUiOjEwNzM3NDE4MjR9XX19LCJyZWdpc3RlcmVkX3Byb29mIjoiU3RhY2tlZERyZzMyR2lCVjFfMSJ9"}}
4. 2022-12-29 14:44:53 +0000 UTC: [event;sealing.SectorPreCommit2] {"User":{"Sealed":{"/":"bagboea4b5abca3bszluafr5p4wke3kudt33wyakalkekoe6cyrua6q4ak5kicbth"},"Unsealed":{"/":"baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq"}}}
5. 2022-12-29 14:44:53 +0000 UTC: [event;sealing.SectorPreCommitted] {"User":{"Message":{"/":"bafy2bzacecrgb4cl2ostu37qqe7c6xyfmipohoxil765v7hgnwg5w7da6w6o6"},"PreCommitDeposit":"95774649064676276","PreCommitInfo":{"SealProof":8,"SectorNumber":239790,"SealedCID":{"/":"bagboea4b5abca3bszluafr5p4wke3kudt33wyakalkekoe6cyrua6q4ak5kicbth"},"SealRandEpoch":2465726,"DealIDs":[],"Expiration":4015680,"UnsealedCid":null}}}
6. 2022-12-29 17:07:23 +0000 UTC: [event;sealing.SectorChainPreCommitFailed] {"User":{}}
found message with equal nonce as the one we are looking for that is NOT a valid replacement message (F:bafy2bzacecrgb4cl2ostu37qqe7c6xyfmipohoxil765v7hgnwg5w7da6w6o6 n 481750, TS: bafy2bzacecdqzayifi2txhxpzhamvljbbzlx253jierq4uhqlqpvipkq6ecjq n481750)
7. 2022-12-29 17:08:23 +0000 UTC: [event;sealing.SectorRetryPreCommitWait] {"User":{}}
Any ideas how to manually remove them from miner?
Is there a low level way to manually mark them as Removed and remove them from miner's db.
Checklist
Latest release
, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.Lotus component
Lotus Version
Describe the Bug
Sector stuck in PreCommitWait state for a long time with error "found message with equal nonce as the one we are looking for that is NOT a valid replacement message"
I think it is due to disk failure during sealing (where chain was located). Daemon down with all its data and was re-created from scratch asap. But looks like some messages was not sent to chain or some metadata has been lost.
After all some sectors stuck and don't want to change their state.
Nothing help and sectors still in PreCommitWait. No way to trigger them to any actions.
Is it expected behavior? How to cleanup them?
Logging Information
Repo Steps
$ lotus-miner --color info --hide-sectors-info
$ lotus mpool pending --local
$ lotus-miner sectors status --log 239790