Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
327 stars 208 forks source link

agd fails with "Cannot find dependency ..." in systemd due to lack of file descriptors #7817

Closed dckc closed 1 year ago

dckc commented 1 year ago

Describe the bug

When starting agd in systemd, it fails with Error#1: config.bundles.coreProposal2_5: Cannot find dependency picomatch ...

To Reproduce

NodesGuru reports:

# Build software
CI_GIT_NAME=Agoric
CI_GIT_FOLDER=agoric-sdk
CI_BIN_VER=ea8c1c64911b4c58fb43635b25e17e3d50d0cf2a
CI_BIN_NAME=agd

cd $HOME
git clone https://github.com/${CI_GIT_NAME}/${CI_GIT_FOLDER}.git
cd $HOME/${CI_GIT_FOLDER}
git fetch --all
git checkout ${CI_BIN_VER}

git submodule update
find . -name node_modules |xargs rm -rf
sudo apt update
curl https://deb.nodesource.com/setup_16.x | sudo bash
sudo apt install -y nodejs gcc g++ make < "/dev/null"
curl -sL https://dl.yarnpkg.com/debian/pubkey.gpg | gpg --dearmor | sudo tee /usr/share/keyrings/yarnkey.gpg >/dev/null
echo "deb [signed-by=/usr/share/keyrings/yarnkey.gpg] https://dl.yarnpkg.com/debian stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
sudo apt update && sudo apt install yarn
yarn install --force
yarn build
(cd $HOME/${CI_GIT_FOLDER}/packages/cosmic-swingset && make)

sudo cp $HOME/go/bin/${CI_BIN_NAME} /usr/local/bin/${CI_BIN_NAME}

${CI_BIN_NAME} version

Then in a systemd unit:

$ cat /etc/systemd/system/agd.service
[Unit]
Description=Agoric Node
After=network-online.target
[Service]
User=ubuntu
ExecStart=/home/ubuntu/agoric-sdk/bin/agd start --address tcp://0.0.0.0:55658 --grpc-web.address 0.0.0.0:12091 --grpc.address 0.0.0.0:12090 --p2p.laddr tcp://0.0.0.0:53956 --rpc.laddr tcp://127.0.0.1:56657 --home /home/ubuntu/.agoric
Environment=PATH="/home/ubuntu/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin:/home/ubuntu/go/bin:/usr/local/go/bin:/home/ubuntu/go/bin:/usr/local/go/bin:/home/ubuntu/go/bin:/usr/local/go/bin:/home/ubuntu/go/bin:/usr/local/go/bin:/home/ubuntu/go/bin:/usr/local/go/bin:/home/ubuntu/go/bin:/usr/local/go/bin:/home/ubuntu/go/bin:/usr/local/go/bin:/home/ubuntu/go/bin:/usr/local/go/bin:/home/ubuntu/go/bin"
Restart=always
RestartSec=3
LimitNOFILE=4096
[Install]
WantedBy=multi-user.target

Expected behavior

agd works as a systemd service

Platform Environment

ea8c1c64911b4c58fb43635b25e17e3d50d0cf2a

Additional context

agoricdev-18

note discord #devnet thread

Screenshots

stack trace from 0xAN | Nodes.Guru:

portHandler threw (Error#1)
Error#1: config.bundles.coreProposal2_5: Cannot find dependency picomatch for file:///home/ubuntu/agoric-sdk/node_modules/ava/
  at packages/SwingSet/src/controller/initializeSwingset.js:538:15
  at async Promise.all (index 18)
  at async processGroup (packages/SwingSet/src/controller/initializeSwingset.js:541:27)
  at async initializeSwingset (packages/SwingSet/src/controller/initializeSwingset.js:570:43)
  at async ensureSwingsetInitialized (packages/cosmic-swingset/src/launch-chain.js:160:5)
  at async buildSwingset (packages/cosmic-swingset/src/launch-chain.js:165:3)
  at async launch (packages/cosmic-swingset/src/launch-chain.js:307:52)
  at async launchAndInitializeSwingSet (packages/cosmic-swingset/src/chain-main.js:453:15)
  at async toSwingSet (packages/cosmic-swingset/src/chain-main.js:670:20)
Cannot initialize Controller Error: config.bundles.coreProposal2_5: Cannot find dependency picomatch for file:///home/ubuntu/agoric-sdk/node_modules/ava/
agd.service: Main process exited, code=exited, status=1/FAILURE

logs from Syd | FR Staking Community:

May 20 1359 ubuntu-8gb-hel1-1 systemd[1]: Started Agoric Cosmos daemon.
May 20 1300 ubuntu-8gb-hel1-1 agd[2190252]: 2023/05/20 1300 Running SwingSet until bootstrap is ready
May 20 1300 ubuntu-8gb-hel1-1 agd[2190252]: Loading slog sender modules: @Agoric/telemetry/src/flight-recorder.js
May 20 1300 ubuntu-8gb-hel1-1 agd[2190252]: 2023-05-20T1300.491Z launch-chain: Launching SwingSet kernel
May 20 1317 ubuntu-8gb-hel1-1 agd[2190252]: portHandler threw (Error#1)
May 20 1317 ubuntu-8gb-hel1-1 agd[2190252]: Error#1: config.bundles.coreProposal2_5: Cannot find dependency picomatch for file:///home/ubuntu/agoric-sdk/node_modules/ava/
May 20 1317 ubuntu-8gb-hel1-1 agd[2190252]:   at packages/SwingSet/src/controller/initializeSwingset.js:538:15
May 20 1317 ubuntu-8gb-hel1-1 agd[2190252]:   at async Promise.all (index 18)
May 20 1317 ubuntu-8gb-hel1-1 agd[2190252]:   at async processGroup (packages/SwingSet/src/controller/initializeSwingset.js:541:27)
May 20 1317 ubuntu-8gb-hel1-1 agd[2190252]:   at async initializeSwingset (packages/SwingSet/src/controller/initializeSwingset.js:570:43)
May 20 1317 ubuntu-8gb-hel1-1 agd[2190252]:   at async ensureSwingsetInitialized (packages/cosmic-swingset/src/launch-chain.js:160:5)
May 20 1317 ubuntu-8gb-hel1-1 agd[2190252]:   at async buildSwingset (packages/cosmic-swingset/src/launch-chain.js:165:3)
May 20 1317 ubuntu-8gb-hel1-1 agd[2190252]:   at async launch (packages/cosmic-swingset/src/launch-chain.js:307:52)
May 20 1317 ubuntu-8gb-hel1-1 agd[2190252]:   at async launchAndInitializeSwingSet (packages/cosmic-swingset/src/chain-main.js:453:15)
May 20 1317 ubuntu-8gb-hel1-1 agd[2190252]:   at async toSwingSet (packages/cosmic-swingset/src/chain-main.js:670:20)
May 20 1317 ubuntu-8gb-hel1-1 agd[2190252]: Cannot initialize Controller Error: config.bundles.coreProposal2_5: Cannot find dependency picomatch for file:///home/ubuntu/agoric-sdk/node_modules/ava/
May 20 1317 ubuntu-8gb-hel1-1 systemd[1]: agd.service: Main process exited, code=exited, status=1/FAILURE
May 20 1317 ubuntu-8gb-hel1-1 systemd[1]: agd.service: Failed with result 'exit-code'.
May 20 1320 ubuntu-8gb-hel1-1 systemd[1]: agd.service: Scheduled restart job, restart counter is at 34452.
May 20 1320 ubuntu-8gb-hel1-1 systemd[1]: Stopped Agoric Cosmos daemon.
dckc commented 1 year ago

reported work-around: run agd outside systemd

dckc commented 1 year ago

related:

dckc commented 1 year ago

diagnosis: bundling ran into file descriptor limit

At start-up, agd does a lot of bundling of JavaScript modules. Outside of systemd, ubuntu has a limit around 1 million. In the reported configuration, we see:

LimitNOFILE=4096

We should reduce the required number of simultaneous file descriptors in due course, but in the mean time

work-around: increase file descriptor limit

Using 64K file descriptors seems to relieve the symptoms:

LimitNOFILE=65536
warner commented 1 year ago

https://github.com/endojs/endo/issues/1593 is the long-term fix for this, to either limit the bundle-source parallelism, and/or react to EMFILE by deferring the open() until some other FD has been closed.

kriskowal commented 1 year ago

I’ve landed a fix for https://github.com/endojs/endo/issues/1593 in Endo and this should be a thing of the past when we next sync Endo releases with Agoric SDK.

dckc commented 1 year ago

@kriskowal is this now a thing of the past? It's in the upgrade-11 release notes.

kriskowal commented 1 year ago

No, I have not yet successfully synced Endo with Agoric SDK. This is more likely to land in upgrade-12.

kriskowal commented 1 year ago

I believe this is now a thing of the past. Please reöpen if symptoms persist.