Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
328 stars 207 forks source link

deployment-test is failing when using newer ssh-node images #9609

Open mhofman opened 5 months ago

mhofman commented 5 months ago

Describe the bug

After tagging agoric-upgrade-16-rc0 / sdk@44, new docker images got built and published.

The deployment-test image currently uses the ssh-node image to run the 2 validator nodes of that test. For some reason it started failing with the following error when starting the agd service inside the validator containers:

Error#1: The module '/usr/src/agoric-sdk/node_modules/better-sqlite3/build/Release/better_sqlite3.node'
was compiled against a different Node.js version using
NODE_MODULE_VERSION 108. This version of Node.js requires
NODE_MODULE_VERSION 109. Please try re-compiling or re-installing
the module (for instance, using `npm rebuild` or `npm install`).

  at makeSwingStore (packages/swing-store/src/swingStore.js:192:12)
  at openSwingStore (packages/swing-store/src/swingStore.js:647:10)
  at launch (packages/cosmic-swingset/src/launch-chain.js:351:42)
  at launchAndInitializeSwingSet (packages/cosmic-swingset/src/chain-main.js:466:21)
  at async toSwingSet (packages/cosmic-swingset/src/chain-main.js:676:24)

This makes very little sense because 109 is the module version for Electron, not Node.js, according to https://github.com/nodejs/node/blob/main/doc/abi_version_registry.json

According to the ansible config, the version of node inside the container is installed from node source. I verified locally and the installed version is 18.20.3 and has the expected 108 module version.

node -e "console.log(process.versions.modules)"
108

To Reproduce

Run the docker deployment test using ssh-node:44

Expected behavior

No errors

Platform Environment

CI deployment-test

Additional context

Restoring the previous image, ssh-node:43, "solves" the issue:

skopeo copy --all docker://ghcr.io/agoric/ssh-node:43  docker://ghcr.io/agoric/ssh-node:latest
mhofman commented 4 months ago

Above command seems to have auth issues sometimes, know to work with

skopeo copy --all docker://ghcr.io/agoric/ssh-node@sha256:107ce5c17744e54fe2ea76ee4c52ace0e5bef4dd2aa6a67172219dfec001ae68  docker://ghcr.io/agoric/ssh-node:latest

using docker image quay.io/skopeo/stable@sha256:30ad61b97acf899e4eb7d592b8a0a3431282d2c291a02de38437bf7178048605

rabi-siddique commented 4 months ago

@mhofman, thank you for the detailed bug report. I'm not immediately familiar with the process for running the docker deployment test usingssh-node:44. Could you provide a bit more detail on that or point me towards the relevant documentation? I want to understand this better and ensure everything is set up correctly.

mhofman commented 4 months ago

The deployment test is run by CI normally, but I've been using https://github.com/Agoric/agoric-sdk/blob/master/scripts/run-deployment-integration.sh to replicate locally. I think if you pull ghcr.io/agoric/ssh-node@44 (or the newer 47 from the final u16 release), and tag it as latest, you should be able to reproduce.

mhofman commented 4 months ago

Btw, that image is built from https://github.com/Agoric/agoric-sdk/blob/master/packages/deployment/Dockerfile.ssh-node