Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
327 stars 206 forks source link

Synchronize with version 3.6.0 or newer of Moddable SDK #6759

Closed mhofman closed 1 year ago

mhofman commented 1 year ago

What is the Problem Being Solved?

Since August 2022 we have not been able to update the Moddable SDK after encountering divergences between validators in the integration test in the last attempt to sync. Since then, after fixing other divergences on mainnet (#6588), we have identified commits in the Moddable SDK tree which were the likely cause the divergences experienced when first trying to update. There are 2 remaining commits which are currently causing an unexpected difference in execution when replaying transcripts.

Details of the upgrade step

The following 2 commits have been identified as causing issues, and already fixed:

The following 2 commits have also been identified as causing unexpected differences in execution, but no fix or explanation has been provided yet:

Test Plan

Once a fix has been published for the remaining unexplained execution differences, create a test branch from release-pismo with the updated Moddable SDK, and use the enhanced replay tool (#6723) to replay a transcript of all mainnet vats (but in priority vat18 and vat6 which have shown susceptible to divergences). Do not merge or publish this test branch on pismo.

After that, a normal branch and PR to update master can be created.

mhofman commented 1 year ago

I just started a new follower node from agd upgrade using branch mhofman/6759-update-xs-compat-test, and it is syncing so far!

mhofman commented 1 year ago

Update: the first sync attempt failed. The replay from transcript also failed for v24, v25, v26, v27. I believe those are the PSM vats.

I have bisected it down to an issue that I thought had previously been fixed. XS: fuzzilli 38 which was partially fixed by XS: initialize slot next when caching arrays.

I updated the test branch with further reverts (including 2 reverts related to the switch from VirtualModuleRecord to ModuleSource, which caused slightly different allocations), and the transcript replay was successful. The sync from agd upgrade is still in progress, but went past the previous failure block height.

mhofman commented 1 year ago

The sync from agd upgrade is still in progress, but went past the previous failure block height.

I have hit a wrong AppHash at block 7709224. I will compare slogs and transcripts to figure out the cause. It's possible we're simply seeing a metering difference since the transcripts replayed without issues.

mhofman commented 1 year ago

I have confirmed that the wrong AppHash is due to slight allocation and metering differences, ultimately resulting in an extra crank in that block, and thus a different activityHash.

extra-crank-updated-xs.diff.gz