Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
323 stars 204 forks source link

Flake: Getting Started (registry/yarn) #9325

Closed Chris-Hibbert closed 1 month ago

Chris-Hibbert commented 4 months ago

Describe the bug

The Getting Started (registry/yarn) is flakey

To Reproduce

The test failed, as shown in https://github.com/Agoric/agoric-sdk/actions/runs/8973265730/job/24643072843?pr=9283

Error: rpc error: code = Unknown desc = rpc error: code = Unknown desc = account sequence mismatch, expected 60, got 59: incorrect account sequence [agoric-labs/cosmos-sdk@v0.46.16-alpha.agoric.2.1/x/auth/ante/sigverify.go:269] With gas wanted: '18446744073709551615' and gas used: '38588' : unknown request

hitting "Rerun failed tests" caused it to pass.

Other instances

Expected behavior

Tests should not be flakey.

Platform Environment

Running in CI.

turadg commented 4 months ago

I've been running into this lately too. Latest with registry/npm

lerna ERR! E503 one of the uplinks is down, refuse to publish
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
turadg commented 1 month ago

9550 set up a retry, but it doesn't always work.

CI log

  workflow

  yarn start:contract works

  Difference (- actual, + expected):

  - 2
  + 0

  › gettingStartedWorkflowTest (packages/agoric-cli/tools/getting-started.js:124:7)

  ─

  1 test failed
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
Warning: Attempt 1 failed. Reason: Child_process exited with error code 1
yarn run v1.22.22
$ yarn run create-agoric-cli /home/runner/bin/agoric
$ node ./scripts/create-agoric-cli.cjs /home/runner/bin/agoric
Script directory /home/runner/bin does not appear in $PATH
(You may want to `export PATH=$PATH:/home/runner/bin' to add it to your PATH environment variable)
ensuring /home/runner/bin exists
creating /home/runner/bin/agoric
Error: /home/runner/bin/agoric must not already exist; you should use a fresh path.
    at Object. (/home/runner/work/agoric-sdk/agoric-sdk/scripts/create-agoric-cli.cjs:45:11)
    at Module._compile (node:internal/modules/cjs/loader:1256:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
    at Module.load (node:internal/modules/cjs/loader:1119:32)
    at Module._load (node:internal/modules/cjs/loader:960:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
    at node:internal/main/run_main_module:23:47
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
Warning: Attempt 2 failed. Reason: Child_process exited with error code 1
yarn run v1.22.22
$ yarn run create-agoric-cli /home/runner/bin/agoric
$ node ./scripts/create-agoric-cli.cjs /home/runner/bin/agoric
Script directory /home/runner/bin does not appear in $PATH
(You may want to `export PATH=$PATH:/home/runner/bin' to add it to your PATH environment variable)
ensuring /home/runner/bin exists
creating /home/runner/bin/agoric
Error: /home/runner/bin/agoric must not already exist; you should use a fresh path.
    at Object. (/home/runner/work/agoric-sdk/agoric-sdk/scripts/create-agoric-cli.cjs:45:11)
    at Module._compile (node:internal/modules/cjs/loader:1256:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
    at Module.load (node:internal/modules/cjs/loader:1119:32)
    at Module._load (node:internal/modules/cjs/loader:960:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
    at node:internal/main/run_main_module:23:47
error Command failed with exit code 1.

turadg commented 1 month ago

@frazarshad to get this working I suggest making a PR in which each of the getting-started tests is run twice. The errors we're seeing seem to be about the jobs not being able to be repeated.

Once we solve that, the "retry upon failure" mechanism should solve the flakiness.

frazarshad commented 1 month ago

@turadg worked on a solution for this but apparently a similar fix has been made recently

turadg commented 1 month ago

@michaelfig 's fix from 2 days ago (after your PR) does fix the leftover file problem. I'd approve changing it to https://github.com/Agoric/agoric-sdk/pull/9740 because I think it's more clear and maintainable.

I'm not certain that's the only problem with retries. I'd still like to see a run inducing (unnecessary) repetition to confirm. But I'm okay with closing this and reopening it if the flake is encountered again.

michaelfig commented 1 month ago

I'd approve changing it to #9740

So would I. My fix was expedient, but I'd be happy to see it structured better (and more idiomatically) as #9740 does.

frazarshad commented 1 month ago

@turadg made #9740 ready for review

turadg commented 1 month ago

Still flaking https://github.com/Agoric/agoric-sdk/actions/runs/10066687315/job/27828607230?pr=9751

https://github.com/Agoric/agoric-sdk/actions/runs/10118245976/job/27984759653?pr=9755