[Tracking] FVM version compatibility issues.

ChainSafe / forest

🌲 Rust Filecoin Node Implementation

https://forest.chainsafe.io

Apache License 2.0

617 stars 145 forks source link

[Tracking] FVM version compatibility issues. #3158

Open ruseinov opened 12 months ago

ruseinov commented 12 months ago

Issue summary

We have an issue with the fact that the code taking care of the two different FVM versions is basically a copy-paste, due to the fact that those don't have a compatibility layer. What we need is to figure out a solution to this problem that's not too involved and somewhat easy to support. Unfortunately, there isn't a clear vision for this yet.

Possible solutions:

Fork the FVM, build a compatibility layer. Why fork? Because the proposal to have a built-in compatibility layer in FVM didn't work with FVM dev team. a. Upsides: total control over the types and conversions, the possibility to use generic code as opposed to copy-pasting it back and forth; potentially influence the way FVM is developed, perhaps the changes eventually get accepted upstream. b. Downsides: maintaining our own fork, having to sync upstream changes, the code will diverge, which could potentially lead to regressions that are hard to track. c. Conclusion: while this seems like extra burden - it could potentially be a great way to cease the initiative and build something robust that'll save us a lot of time and pain going forward.
Writing an elaborate compatibility layer on the Forest side that converts fvmv2 <-> fvmv3. a. Upsides: Will allow for some code deduplication. b. Downsides: A lot of extra code without being able to use idiomatic Rust conversions; potentially making the implementation even more convoluted than it already is. c. Conclusion: This option has been explored before and it didn't look to amazing.
Leave the code as is. a. Upsides: No immediate action required; no need to maintain yet another library; potentially less convoluted than 2. b. Downsides: This code smells, it's pretty much 90% duplicated and it seems there's nothing we can do, unless we start touching the libraries it uses. c. Conclusion: We wouldn't be discussing this if it were a viable option. But it's good to have it on the table.

Other information and links

This should be fleshed out further once we decide which path needs to be taken.

lemmih commented 12 months ago

Context and open questions:

Before the FVM, messages were executed manually. Lotus would implement the transactions in Go, Forest would implement them in Rust, Fuhon (now dead) would implement them in C++. For network version 15, Lotus used both the manual implementation and the FVM at the same time. They did this to make sure the FVM worked as expected.

Forest switched entirely to the FVM at network version 15; we do not support any earlier versions. Open question: Can we use the FVM to execute transactions for earlier network versions?

What is the difference between FVMv2 and FVMv3? FVMv2 is used exclusively for network versions 15, 16, and 17. Why can't we use FVMv3 for these network versions? Can FVMv3 be generalized to support all network versions?

aatifsyed commented 12 months ago

Can you link to some examples to help me understand the problem this causes in our code?

ruseinov commented 12 months ago

Can you link to some examples to help me understand the problem this causes in our code?

Here's some context: v2 vs v3.

aatifsyed commented 12 months ago

I'm not convinced, or I don't understand. These look to me like failures of composition.

Take worker_key_at_lookback, for example - it's pretty trivial to deduplicate. https://github.com/ChainSafe/forest/commit/ddb50de7850af739795a9d9a122f89dd3313d455

Also, I worry that our abstraction boundaries are wrong around this, not just in this module, but in general.

My instinct is to get rid of the majority of the shim module, have traits that abstract over fvm2 and fvm3, and just write a bunch of stuff that's generic over those traits.

As an example:

pub trait AsV3Address { // instead of a shim struct
    fn as_v3_address(self) -> fvm3::Address;
}

impl AsV3Address for fvm2::Address {}
impl AsV3Address for fvm3::Address {}

fn some_common_stuff(a: impl AsV3Address) {}

aatifsyed commented 12 months ago

Writing an elaborate compatibility layer on the Forest side that converts fvmv2 <-> fvmv3. a. Upsides: Will allow for some code deduplication. b. Downsides: A lot of extra code without being able to use idiomatic Rust conversions; potentially making the implementation even more convoluted than it already is. c. Conclusion: This option has been explored before and it didn't look to amazing.

Could I see the design for this somewhere?

ruseinov commented 12 months ago

Writing an elaborate compatibility layer on the Forest side that converts fvmv2 <-> fvmv3. a. Upsides: Will allow for some code deduplication. b. Downsides: A lot of extra code without being able to use idiomatic Rust conversions; potentially making the implementation even more convoluted than it already is. c. Conclusion: This option has been explored before and it didn't look to amazing.

Could I see the design for this somewhere?

A full example does not exist atm, I should have saved it, but it didn't occur to me at the time. For example, in order to convert, for example, an Address - you'd need to serialize and then deserialize it.

I'm not convinced, or I don't understand.

I wasn't either, but if you look closely - you will notice that types that seem the same are actually coming from two different libraries. Writing mappings between those is definitely verbose, some also require ser/de, which is not ideal. The important thing here is whatever one must do here is suboptimal, because a good design of a compatibility layer between fvm v2 and v3, if one needs to maintain v3 at all, should most likely adopt a technique, where v3 would use the v2 types whenever they are fully backwards-compatible. And implement only the new/modified features.

A better solution would be to avoid using fvm v2, making sure that v3 can handle all the data in a way that's compatible with v2. I'm not, however, currently equipped to say where or not this is easily possible, but I think we can find out by implementing a bunch of tests using old-versioned messages.

IMHO it would be good to push this to the right direction as opposed to trying to work around this. Having to maintain two VM versions is definitely suboptimal.

lemmih commented 12 months ago

I'm not convinced, or I don't understand. These look to me like failures of composition.

Take worker_key_at_lookback, for example - it's pretty trivial to deduplicate. ddb50de

Also, I worry that our abstraction boundaries are wrong around this, not just in this module, but in general.

My instinct is to get rid of the majority of the shim module, have traits that abstract over fvm2 and fvm3, and just write a bunch of stuff that's generic over those traits.

As an example:
pub trait AsV3Address { // instead of a shim struct
    fn as_v3_address(self) -> fvm3::Address;
}

impl AsV3Address for fvm2::Address {}
impl AsV3Address for fvm3::Address {}

fn some_common_stuff(a: impl AsV3Address) {}

Filecoin has several concepts (like address, signature, token-amount, message, and receipt) that seemingly align with types in the FVM (Address, Signature, TokenAmount, etc). But the FVM types are not guaranteed to align with the Filecoin concepts, though. For example, the Signature type in the FVM is not the signature from the Filecoin specification. It will fail if you parse incoming signatures from the p2p network using the FVM signature type.

We thought it was a bug/deficiency in the Signature type and talked to the FVM people about it. They told us that the FVM types were not meant for public consumption and that we should not rely on them. When push comes to shove, we need to model Filecoin concepts separately from the FVM.

ruseinov commented 12 months ago

We thought it was a bug/deficiency in the Signature type and talked to the FVM people about it. They told us that the FVM types were not meant for public consumption and that we should not rely on them. When push comes to shove, we need to model Filecoin concepts separately from the FVM.

Perhaps it's not such a bad thing, the fact that we don't rely on FVM internals for exposed APIs and types. It is, of course, an extra thing that we need to maintain, but that allows for FVM types flexibility, which hopefully ends up being useful.