Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
327 stars 207 forks source link

XS divergence between amd64 and aarch64 (Mac M1) ? #7841

Open warner opened 1 year ago

warner commented 1 year ago

Describe the bug

@arirubinstein reports observing a CPU-dependent failure when testing out some upgrades, in which the x86 (aka amd64) architecture worked, but the M1 (aka aarch64) arch did not.

We don't expect XS to behave any differently on these two architectures, however we've recently been surprised by compiler differences like #7836 , and a Mac is definitely using clang, while most amd64 boxes are going to be running Linux, which will be using gcc.

His reproduction steps are below (but they reference a zipfile that was published to our internal Slack, #team-engineering for Agoric folks):

repro steps - download this zip and extract it to the repo here, so ./state/22.04_amd64, etc exists from repo root https://github.com/agoric/docker-variety
x86: works
make starting_branch=mainnet1B-rc1 22_x86
arm: apphash on upgrade block
make starting_branch=mainnet1B-rc1 22
mhofman commented 1 year ago

Using both branch, mhofman/6784-hide-organic-gc-updated-moddable-3-9-7 and mhofman/6784-hide-organic-gc-updated-moddable-3-9-7-no-key-collect, I was unable to reproduce a divergence replaying pismo chain transcripts on arm64-apple-darwin21.6.0 (clang13 / M1Pro) compared to the same replay on Debian Bullseye x86_64 (gcc 10.2.1), both using release build.

@raphdev had mentioned he was able to reproduce an immediate divergence in the latter branch on his M1. We'll need to isolate the source of the divergence.

mhofman commented 10 months ago

To clarify the above, our experience is that Darwin arm64 and Debian x64 agree, but that Debian aarch64 (VM on a Mac) diverges when replaying the pismo transcripts.