feat: increase default and max FVM concurrency

Stebalien commented 3 months ago

We used to allocate 1024 instances per "concurrency" but we now only allocate 1024 plus 20 per "concurrency". So, I'm increasing the default and maximum concurrency factors to take advantage of this. Importantly:

Max concurrency 128 -> 4096. 1024 + 4096*20 < 128 * 1024.
Default concurrency 4 -> 150. 1024 + 150*20 < 4 * 1024.

Motivation: I'm hoping that the default (150) is sufficient (more than the previous max...) so we can just get rid of this knob.

fixes https://github.com/filecoin-project/lotus/issues/11817

Alternatively, we could just remove the knob now? The default is more than the previous max... On the other hand, the ability to decrease the max may be nice...

rvagg commented 3 months ago

OK, so I think this is related to this: https://github.com/filecoin-project/ref-fvm/blob/464bba53ad01391452db605002c9b5cca189fe39/fvm/src/engine/mod.rs#L63

Which comes from: https://github.com/filecoin-project/ref-fvm/commit/b77d152699a566f0ecd434a197885866f0d8840f which looks like it's been in fvm since v3.5.0. So does that mean we've actually been running with 1024 + 4*20 from that point till now? But this hasn't shown up as a problem?

I don't fully understand the impact of this to be honest, why does the pool size relate to the call stack depth? Is "call stack" here defined as calls from one actor to another? So we use a separate VM each time we execute an actor's method?

Stebalien commented 3 months ago

Some background:

We use this concurrency option for two things:

To limit the maximum number of FVM messages we can execute in parallel.
More importantly, to determine how many wasmtime "vm instances" we need to allocate up front.

Worst case scenario, processing a single message will require 1024 wasmtime instances, because the stack might reach 1024 calls deep (we need an instance per actor in the call stack). That's why, prior to 3.5.0, we allocated 1024*concurrency wasmtime instances (1024 * 4, by default). This was rather wasteful, but was very simple to implement.

In 3.5.0 we changed our logic to oversubscribe (underallocate) instances. However, this was trickier to implement because we needed to avoid a deadlock where multiple concurrent FVM invocations need an instance to progress but we have no more available. We solved this by ensuring that one of the concurrent FVM invocations always has the full 1024 instances to work with, ensuring that one of these invocations is guaranteed to be able to make progress at any time.

Answer:

Since 3.5.0, we've been allocating significantly fewer instances. This patch is taking advantage of this additional headroom by enabling for additional concurrency.

Stebalien commented 3 months ago

Hm. So... unfortunately, we have a thread-pool limit in the FVM itself. We probably need to make this thread-pool flexible.

Stebalien commented 3 months ago

Replacing this PR with https://github.com/filecoin-project/filecoin-ffi/pull/449.

filecoin-project / filecoin-ffi

feat: increase default and max FVM concurrency #448