Open rwxr-xr-x opened 1 year ago
the same question
when FIN , this is the worker log
I did lot of tests with miner and only ONE worker. Finalyze always stuck. Before 1.19 update everything works fine.
how to fix it
this is the mine log
no any error log
Direct finalizing from lotus-worker to a long term storage fails with 500 code. It was working when I was onboarding by snapping but it is not working when I'm onboarding by sealing.
2023-02-19 00:49:50 +0900 KST: [event;sealing.SectorFinalizeFailed] {"User":{}} finalize sector: moving sector to storage: storage call error 0: failed to acquire sector {xxxx xxxx} from remote (tried [{c3501e46-6ade-414a-81a3-3b0c569f803f [http://xxxx:1111/remote/cache/s-t0xxxx-xxxx] [http://xxxx:1111/remote] 10 true false false [] []}]): 1 error occurred:
[name: xxxxx]: failed to acquire sector {xxxx xxxx} from remote (tried [{c3501e46-6ade-414a-81a3-3b0c569f803f [http://xxxx:1111/remote/cache/s-t0xxxx-xxxx] [http://xxxx:1111/remote] 10 true false false [] []}]): 1 error occurred:
@magik6k Do you have any guess on what could have broken RemoteFinalize since v1.17.0?
I restarted lotus-miner after changing DisallowRemoteFinalize to false and I can see sectors are going directly from workers to the long term storage. I also checked the sectors getting fully finalized to 'Proving' state so it is defintely working.
My guess the meaning changed since v1.17.0 'Disallow remote finalize from lotus-miner' -> 'Disallow remote finalize from lotus-worker'. Setting false means 'Allow finalize on lotus-worker' so it makes sense.
I restarted lotus-miner after changing DisallowRemoteFinalize to false and I can see sectors are going directly from workers to the long term storage. I also checked the sectors getting fully finalized to 'Proving' state so it is defintely working.
why you decided that they are going directly from workers?
I checked the traffic on the long term storage server using iftop Data was flowing directly from the workers
Checklist
Latest release
, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.Lotus component
Lotus Version
Describe the Bug
Sector stuck in FinalizeSector state with DisallowRemoteFinalize=true. Even when miner and all workers have access to Seal and Store storages locally.
I know there were some similar issues like this (but looks like they are closed but not fully fixed):
Before 1.17 all worked flawlesly with remote finalyze. First, we started expiriencing this issue on 1.17+ versions. And now on 1.19.0 we still facing it too. We even have allocated a separate setup for more detailed tests because it is critical for our setups due to this bottleneck.
I know about DisallowRemoteFinalize warning that workers must have acces to seal and long term store storages. Before 1.17 we share long term only on PC2 and C2 workers and they finalize sectors as expected. But now even when long term storage shared on miner and ALL workers including WDP, WNP and all sealing workers - sectors stuck in FinalizeSector state.
To fix these sectors we need to edit miners config and set DisallowRemoteFinalize=false and restart miner. Right after it all sectors moved to long term storage successfully but they are moved by miner (by single server, single link).
Miners config part:
Logging Information
Repo Steps
I remove all workers and leave ONLY 1 worker of each type. All of them has Local access to Seal and Store storages.
Below is how it happens during test (moment of stuck):
$ lotus-miner sectors status --log 239447
$ lotus-miner sectors list --fast --seal-time --events
$ lotus-miner sealing sched-diag
$ lotus-miner storage find 239447
AP Worker Info:
PC1 Worker Info:
PC2 Worker Info:
C2 Worker Info:
WDP Worker Info:
WNP Worker Info:
All workers have Local access to same locations:
Also tried: