ChainSafe / forest

🌲 Rust Filecoin Node Implementation
https://forest.chainsafe.io
Apache License 2.0
616 stars 145 forks source link

Fix `Filecoin.StateReplay` test (or implementation) #4432

Open LesnyRumcajs opened 1 week ago

LesnyRumcajs commented 1 week ago

Issue summary

The method was disabled in https://github.com/ChainSafe/forest/pull/4429. Once fixed, it should be removed from the filter lists.

Other information and links

hanabi1224 commented 1 week ago

Payload to reproduce the failure:

{"jsonrpc":"2.0","id":0,"method":"Filecoin.StateReplay","params":
[[{"/":"bafy2bzacebjderqonlhtcp42rxldzndqm7tkfhuq62qauqseig2kur4tq6qz4"},{"/":"bafy2bzaceaaro3br2ytfoxhjp5fh26b6u6w4uzy6hvx5g3vzkfb6v3twjjtmk"},{"/":"bafy2bzacebqp4axkcpphgt36hhufcjqni6q62ggjfrn2iafjzsjdjdbwyzdby"}],{"/":"bafy2bzaceaekui2v37hfn54wgijzquegmwfwthug4cp3fvvgidpzbdvepthje"}]
}

the only sensible diff:

2024-06-20T07:33:51.7767290Z                  "MsgRct": {
2024-06-20T07:33:51.7767298Z 
2024-06-20T07:33:51.7767551Z -                  "ExitCode": 0, \\ forest
2024-06-20T07:33:51.7767559Z 
2024-06-20T07:33:51.7767717Z +                  "ExitCode": 10, \\ lotus
2024-06-20T07:33:51.7767725Z 
2024-06-20T07:33:51.7767899Z                    "Return": null,
2024-06-20T07:33:51.7767907Z 
2024-06-20T07:33:51.7768067Z                    "ReturnCodec": 0
2024-06-20T07:33:51.7768075Z 
2024-06-20T07:33:51.7768208Z                  },
hanabi1224 commented 1 week ago

The only sensible diff(part of FVM execution trace) comes from FVM. The root cause might be that forest is using fvm@4.1 while lotus-v1.17 is using fvm@4.2 via filecoin-ffi The issue might be automatically resolved when both forest and lotus switch to fvm@4.3

@LesnyRumcajs, do you think it would be a proper mitigation if we just ignore the ExitCode field for now? Since there's very little we can do about the only diff in responses in this failure.

Note on exit code 10:

/// An internal VM assertion failed.
pub const SYS_ASSERTION_FAILED: ExitCode = ExitCode::new(10);

Note2: the exit code is 0 (success) for forest on both fvm@4.1 and fvm@4.3. It might be a regression in fvm@4.2 but it takes non-trival effort to verify this in forest with fvm@4.2

LesnyRumcajs commented 1 week ago

@hanabi1224 I am not convinced this is a proper fix. The goal of those tests is to ensure response parity between Forest and Lotus at specific versions, and not partial parity. I'd rather the test either fails or not, be it from a single byte diff or the entire payload. If there is no parity between Forest and Lotus at one point (like now), we must know it.

hanabi1224 commented 1 week ago

@LesnyRumcajs For this particular RPC method, the response is so large(168kb) and complicated that I have to attach the files, everything matches except this exit_code field. We could also wait for a lotus release with fvm@4.3 and check again

forest.json lotus.json

LesnyRumcajs commented 1 week ago

I'd rather we wait for a Lotus FVM bump and if it still doesn't match, we investigate it further.

LesnyRumcajs commented 1 week ago

On a side note, did you try it with a Lotus from ~1 month ago, before they bumped the FVM?

hanabi1224 commented 1 week ago

On a side note, did you try it with a Lotus from ~1 month ago, before they bumped the FVM?

@LesnyRumcajs good point, I will do that tomorrow

hanabi1224 commented 1 week ago

Interestingly, I tried with a local build of Lotus 1.27.0(fvm@4.1.2), and the exit code is also 10, which differs from the forest.

LesnyRumcajs commented 1 week ago

Some feature flags, FVM options?