filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.85k stars 1.27k forks source link

AP (Add Piece) multi core performance issues / not always desirable #7178

Open benjaminh83 opened 3 years ago

benjaminh83 commented 3 years ago

Checklist

Lotus component

What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.

We are now seeing good progress on the market node and the distribution of deals to AP workers. This highlights the next bottleneck in the deal handling flow, which is the AP process. On my AP/PC1 worker specced with Epyc 7F72, 8xNVMe raid0, 880GiB memory, I observe the following behavior while staging full 32GiB deals (running the AP):

CPU usage is all cores 2-10% - also when running 4 APs on the same time. I think the multicore AP is kind of broken.

Describe the solution you'd like

  1. Maybe we should get that variable in there to be able to switch back to single core AP. At least, that gave us a super predictable AP time of 20m, but we could do 2-4-8-12 AP processes in parallel, and not much difference.
  2. Alternative could be a variable to control how many cores multi core AP is allowed to use per job.
  3. Alternative could be to fix the performance of multi core AP, so it does not as this drastically diminishing performance when running multiple jobs.

Describe alternatives you've considered

No response

Additional context

I think all storage providers will need to be able to do multiple APs, due to the nature of how deals are desirably published in batches, and will start up AP jobs in batches.

neondragon commented 3 years ago

The scheduler also schedules one AP task per available CPU core on a worker, left-over behavior from when AP was single-threaded. It schedules e.g. 16 multi-core AP tasks on a 16 CPU worker. On a 64-core worker, it will schedule maximum 64 AP tasks running in parallel.

As running parallel AP appears to decrease overall throughput, I think we either need performance improvements to AP such that they scale linearly, or a way to limit the maximum number of parallel AP tasks per-worker.