filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.85k stars 1.27k forks source link

Merge the task quantities for the worker #11362

Open strahe opened 1 year ago

strahe commented 1 year ago

Checklist

Lotus component

Lotus Version

lotus-miner version 1.24.0-rc2+calibnet+git.f5fe522e1

Repro Steps

No response

Describe the Bug

In my test environment, when enabling and disabling SyntheticPoRep, the counts for both types of tasks cannot be merged:

image

Even though the current worker has already executed 24 PC1 tasks (the maximum I've set), the scheduling system continues to dispatch tasks to the worker, resulting in unexpected errors.

Logging Information

no log
rjan90 commented 1 year ago

In my test environment, when enabling and disabling SyntheticPoRep, the counts for both types of tasks cannot be merged

Hey @strahe! Can you elaborate a bit more on the repro steps here. Are you enabling/disabling the config while the lotus-miner process is running, or shutting it down in between? Is the steps:

  1. export PC1_32G_MAX_CONCURRENT=24
  2. Run lotus-miner with SyntheticPoRep=false
  3. Pledge sectors
  4. Change SyntheticPoRep config to SyntheticPoRep=true (with or without stopping the lotus-miner?)
  5. Pledge sectors
strahe commented 1 year ago

Hi, thanks for your reply, in my case, i enabling/disabling the config while lotus-miner is running(randomly enabling some and disabling others), but i realize that the real issue doesn't lie here, and even as you mentioned(pledge -> enabling/disabling -> restart -> pledge), this problem can still be reproduced.

https://github.com/filecoin-project/lotus/blob/48a3076876e0694e718c1072aa5145a63133e1cc/storage/sealer/stats.go#L47

https://github.com/filecoin-project/lotus/blob/48a3076876e0694e718c1072aa5145a63133e1cc/storage/sealer/sealtasks/task.go#L163-L165

When enabling/disabling SyntheticPoRep, the RegisteredSealProof also changes, and this is unrelated to whether lotus-miner is restarted.

rjan90 commented 1 year ago

So the resource-restrictions are applied in resources.go, and the RegisteredSealProof for SyntheticPoReps are set to the same as V1 (As V1_1 is also set). But why those limitations are not applied correctly when enabling/disabling and having both SynthPoRep and V1_1 PoReps sealing at the same time is not yet clear to me. I would have to dig a bit further into that.

https://github.com/filecoin-project/lotus/blob/60f78c792207fd1054628b1b87aed86a5324e687/storage/sealer/storiface/resources.go#L580-L603

strahe commented 1 year ago

Another test environment(64G Sector): image

strahe commented 10 months ago

So the resource-restrictions are applied in resources.go, and the RegisteredSealProof for SyntheticPoReps are set to the same as V1 (As V1_1 is also set). But why those limitations are not applied correctly when enabling/disabling and having both SynthPoRep and V1_1 PoReps sealing at the same time is not yet clear to me. I would have to dig a bit further into that.

The same resource limitations apply only to control demands of the same type. For instance, if I set the maximum parallel tasks for PC1 to be 10, and different RegisteredSealProof types(SynthPoRep enabling/disabling) each can run 10 tasks, then they all comply with the resource restrictions. This might result in a total of 20 pc1 tasks running in parallel.

Perhaps we can calculate the sum of the task quantities for both types here:

https://github.com/filecoin-project/lotus/blob/924af42947df4b3d0980e3e51aa715485ef67846/storage/sealer/sched_resources.go#L151-L152