filecoin-project / ref-fvm

Reference implementation of the Filecoin Virtual Machine
https://fvm.filecoin.io/
Other
382 stars 137 forks source link

Generics and transmutation make replacing Kernel very difficult #1779

Open anorth opened 1 year ago

anorth commented 1 year ago

@alexytsu and I are improving tooling for benchmarking and profiling native actors on the FVM (see https://github.com/anorth/fvm-workbench and https://github.com/filecoin-project/builtin-actors/issues/1236). We've run into great difficulty attempting to fake out expensive VM intrinsic operations like proof verification, the CryptoOps trait of Kernel.

We can wrap the DefaultKernel and delegate most methods...

pub struct BenchKernel<B: Blockstore + 'static> {
    kernel: DefaultKernel<DefaultCallManager<DefaultMachine<B, FakeExterns>>>,
    // ...
}
let executor = DefaultExecutor::<BenchKernel<B>>::new(EnginePool::new_default(engine_conf)?, machine)?;

... but this is only effective for the top-level call. During a sub-call, the wrapped DefaultKernel specifies a static Self as a type parameter to CallManger::send. No instance reference is passed (and indeed the kernel later consumes itself).

    let result = self.call_manager.with_transaction(|cm| {
        cm.send::<Self>(
            from, *recipient, method, params, value, gas_limit, read_only,
        )
    })?;

The Kernel trait specifies a new() constructor, and the call manager later uses that to construct a new kernel for the subcall, but it always constructs DefaultKernel.

With reference to some code in the conformance testing package, we tried also wrapping the CallManger so as to specify our own BenchKernel as Self instead, also replacing with_transaction to do a funky transmutation. This also isn't doing the job. When the DefaultCallManager instantiates a Kernel inside send_resolved it passes itself in. Therefore, when the first Kernel is created, we get a BenchKernel<DefaultCallManager> rather than a BenchKernel<BenchCallManager> as needed. It looks like for the BenchCallManager to intercept this it would need to re-implement that entire call stack CallManager::send -> DefaultCallManager::send_unchecked -> DefaultCallManager::send_resolved.

The conformance testing code also uses a TestMachine, we're not yet sure if this is necessary. But big picture, this static type-parameter based structure makes it very difficult to replace the kernel. There's probably some amount of reimplementation of these components that will get us there, but that would introduce huge and fragile coupling to the internals of the FVM.

A much simpler way would be for the Kernel type parameter to CallManager be instead a runtime parameter of a Kernel constructor/factory function (removing new() from the Kernel trait). It would then be unnecessary to wrap CallManager or anything else. This would involve following a pointer during an internal send, but much less code and complexity (less monomorphised code to build too).

anorth commented 1 year ago

@alexytsu and I will try this out to confirm it works.

anorth commented 1 year ago

I can see that adding anything dynamic here is going to be a tough slog against the current norms.

I'm having a go adding an additional method type parameter to Kernel::send in #1780

anorth commented 1 year ago

The TestMachine does seem like a necessary part of the conformance testing set-up. It's tightly coupled to the TestKernel in that it holds the kernel configuration data necessary to pass through when a kernel destroys itself and then is later re-created.

alexytsu commented 1 year ago

Though the kernel can now be successfully replaced https://github.com/filecoin-project/ref-fvm/pull/1780 without the need for replacing the DefaultExecutor or DefaultCallManager, it is still hard to provide an API for dynamic mock behaviour of the Kernel.

The kernel is not long-lived and it's construction is handled by the CallManager so we usually won't have access to inject it with fake data or instruct it to intercept/passthrough certain calls dynamically.

Stebalien commented 1 year ago

So, when we built this, we wanted to pass a constructor through. However, we ran into some recursive type issues. Kernel contains a CallManager which contains a Machine, so the CallManager can't be constructed with knowledge of the kernel. I.e., We can't have CallManager<Kernel<CallManager=???>>>.

An alternative would be dynamic types. Unfortunately, due to some wasmtime restrictions, this would need to be Box<dyn ...> (the kernel needs a static lifetime).

However dynamic types (objects) can't have generic methods. Which... may be fine, actually. The performance will be slightly worse, but none of this should be performance critical...

Stebalien commented 1 year ago

Hm. I dug into making this dynamic and... we probably need to make the blockstore dynamic first. We don't have to, but the types get kind of annoying if we don't.

anorth commented 1 year ago

Hm. I dug into making this dynamic and... we probably need to make the blockstore dynamic first. We don't have to, but the types get kind of annoying if we don't.

See also https://github.com/filecoin-project/builtin-actors/issues/133#issuecomment-1340101111