Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
327 stars 206 forks source link

Extended automatic agd deps install #9769

Open mhofman opened 3 months ago

mhofman commented 3 months ago

What is the Problem Being Solved?

Manage dependencies automatically for validators

Use cases:

  1. Validator using cosmovisor with DAEMON_ALLOW_DOWNLOAD_BINARIES=true to automatically download and start on upgrade
  2. Validator using cosmovisor without DAEMON_ALLOW_DOWNLOAD_BINARIES=true and building in advance
  3. Validator using custom service (systemd most likely), following our release notes build instructions
  4. Docker image published from agoric-sdk

building in advance and running environment might be different users, and different PATH and other environment

Cosmovisor defined a DEAMON_HOME with the following structure:

$DAEMON_HOME/cosmovisor
├── current -> genesis or upgrades/<name>
├── genesis
│   └── bin
│       └── $DAEMON_NAME
└── upgrades
    └── <name>
        └── bin
            └── $DAEMON_NAME

With auto-download, it automatically extracts agoric-sdk in $DAEMON_HOME/cosmovisor/upgrades/<name> and invokes current/bin/agd. That should automatically install required dependencies.

Without auto-download, validators using cosmovisor should extract the sdk themselves under $DAEMON_HOME/cosmovisor/upgrades/<name>, then build according to release instructions. Note that some community instructions checkout agoric-sdk in $HOME/<name>, install & build (using old instructions), and then symlink $DAEMON_HOME/cosmovisor/upgrades/<name>/bin/agd to $HOME/<name>/bin/agd, which may be a problem for the current agd script because of nested symlinks resolution.

Our release notes should provide simple and robust build instructions, and make it clear that they are opting into automatically downloading dependencies. We should support a mode to avoid this auto download.

Docker image and cosmovisor without auto-download should not attempt to rebuild or download anything

Description of the Design

  1. INSTALL_DEPS=true agd build or agd build --install-deps // to be bikeshed (deps, or build tools, ?)
    • Automatically downloads nvm and gvm in a directory relative to agoric-sdk
    • Uses nvm and gvm to install the required version (unique) of node and golang (repoconfig.sh)
    • Remember this was done (saves a file in agoric-sdk?)
  2. behavior of agd build without install dep option (including implicit build by agd start) depends on cosmovisor autodownload env
    • with auto-download: run the logic of install deps, but using the DEAMON_HOME instead directory of agoric-sdk
    • without auto-download: verifies that ambient version of node and golang matches allowed versions (looser than required version of install deps). Remember at least NODE_MODULE_VERSION alongside stamps. Remember the full node version that we observed, including path, for diagnostics.
  3. agd start
    • If sdk wasn't built or is outdated, implicitly invoke agd build (without install deps), unless we're in cosmovisor mode without auto-download, or using NO_BUILD
    • Automatically uses the installed deps if available / found (by 1.)
    • Check that node is available and matches allowed versions. If not error and report any diagnostics info saved in previous agd build
    • Check that NODE_MODULE_VERSION hasn't changed, otherwise warn and rebuild agd build --force (which should do yarn install --force, but doesn't currently)

Work with validator community to update cosmovisor instructions

Update release instructions script

Security Considerations

Scaling Considerations

Test Plan

Cosmovisor automated test Automated test checking out new agoric-sdk without git clean on top of previous release, and running agd start and/or agd build.

Upgrade Considerations

Deployed as part of a new chain software upgrade (affects the bin/agd script)

mhofman commented 3 months ago

Note, we should avoid modifying the profile when automatically installing NVM : PROFILE=/dev/null. see https://github.com/nvm-sh/nvm?tab=readme-ov-file#additional-notes

mhofman commented 2 months ago

The script also needs to handle corepack if enabled, see #10009

Testing likely involves integration a matrix of environment with different deps installed or not

mhofman commented 3 weeks ago

During upgrade-17, it's possible a validator got confused by the message At least golang/cosmos/x/swingset/module.go is newer than golang/cosmos/build/agd, and thought it was an error. Their node didn't seem to start. They may not have left time for the build to complete, or the build actually failed. We should provide a clearer message (e.g. say something about a build about to start), and make sure missing go or other go errors are raised correctly.