Open benjaminh83 opened 3 years ago
The scheduler also schedules one AP task per available CPU core on a worker, left-over behavior from when AP was single-threaded. It schedules e.g. 16 multi-core AP tasks on a 16 CPU worker. On a 64-core worker, it will schedule maximum 64 AP tasks running in parallel.
As running parallel AP appears to decrease overall throughput, I think we either need performance improvements to AP such that they scale linearly, or a way to limit the maximum number of parallel AP tasks per-worker.
Checklist
Ideas
.Lotus component
What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.
We are now seeing good progress on the market node and the distribution of deals to AP workers. This highlights the next bottleneck in the deal handling flow, which is the AP process. On my AP/PC1 worker specced with Epyc 7F72, 8xNVMe raid0, 880GiB memory, I observe the following behavior while staging full 32GiB deals (running the AP):
CPU usage is all cores 2-10% - also when running 4 APs on the same time. I think the multicore AP is kind of broken.
Describe the solution you'd like
Describe alternatives you've considered
No response
Additional context
I think all storage providers will need to be able to do multiple APs, due to the nature of how deals are desirably published in batches, and will start up AP jobs in batches.