Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
326 stars 206 forks source link

multichain-e2e: minimize ci flakes #9815

Open 0xpatrickdev opened 1 month ago

0xpatrickdev commented 1 month ago

What is the Problem Being Solved?

When performing integration testing with networked applications, making all the sync points explicit and reliable is particularly challenging. Nonetheless, strategies can be employed to limit flakiness. In its current form this issue is intended to be a catch-all for flakes related to multichain-testing and the Multichain E2E CI Job.

The main retry mechanism that's current employed is a rudimentary retryUntilCondition helper that attempts maxRetries on retryIntervalMs.

Description of the Design

Security Considerations

Scaling Considerations

Test Plan

Upgrade Considerations

0xpatrickdev commented 1 month ago

A recent motivating failure can be found here: https://github.com/Agoric/agoric-sdk/actions/runs/10154140014/job/28078772380?pr=9735#step:12:1574.

The condition we are waiting for is staking rewards to become available after delegating. This failure only happens in one of the scenarios (chains) which suggests we need to increase the retry attempts/interval logic for this particular condition.

0xpatrickdev commented 2 weeks ago

More recently, we are seeing failures for the makeAccount step (the first part of most flows):

auto-stake-it › auto-stake-it on osmosis
  Rejected promise returned by test. Reason:

  Error {
    message: 'osmosis-makeAccountsInvitation-1724785332378 continuing invitation is in vstorage condition failed after 6 retries.',
  }

  Error: osmosis-makeAccountsInvitation-1724785332378 continuing invitation is in vstorage condition failed after 6 retries.
    at retryUntilCondition (file:///home/runner/work/agoric-sdk/agoric-sdk/agoric-sdk/multichain-testing/tools/sleep.ts:28:11)
    at async exec (file:///home/runner/work/agoric-sdk/agoric-sdk/agoric-sdk/multichain-testing/test/auto-stake-it.test.ts:112:37)

  basic-flows › Create account on osmosis
  Rejected promise returned by test. Reason:

  Error {
    message: 'osmosis-makeAccount-1724785470505 continuing invitation is in vstorage condition failed after 6 retries.',
  }

  Error: osmosis-makeAccount-1724785470505 continuing invitation is in vstorage condition failed after 6 retries.
    at retryUntilCondition (file:///home/runner/work/agoric-sdk/agoric-sdk/agoric-sdk/multichain-testing/tools/sleep.ts:28:11)
    at async exec (file:///home/runner/work/agoric-sdk/agoric-sdk/agoric-sdk/multichain-testing/test/basic-flows.test.ts:50:37)

  stake-ica › send wallet offers to stakeOsmo contract
  Rejected promise returned by test. Reason:

  Error {
    message: 'rewards available on osmosis condition failed after 8 retries.',
  }

  Error: rewards available on osmosis condition failed after 8 retries.
    at retryUntilCondition (file:///home/runner/work/agoric-sdk/agoric-sdk/agoric-sdk/multichain-testing/tools/sleep.ts:28:11)
    at async file:///home/runner/work/agoric-sdk/agoric-sdk/agoric-sdk/multichain-testing/test/stake-ica.test.ts:99:23

Tying these to swingset logs, I see:

2024-08-27T19:13:55.3712126Z 2024-08-27T19:02:13.188Z SwingSet: vat: v32: walletFactory.fromBridge: { blockHeight: 323, blockTime: 1724785332, owner: 'agoric1zdzzfta2t2w5vvsugwgcrpdh4jskxqy8ny7s86', spendAction: '{"body":"#{\\"method\\":\\"executeOffer\\",\\"offer\\":{\\"id\\":\\"osmosis-makeAccountsInvitation-1724785332378\\",\\"invitationSpec\\":{\\"callPipe\\":[[\\"makeAccountsInvitation\\"]],\\"instancePath\\":[\\"autoAutoStakeIt\\"],\\"source\\":\\"agoricContract\\"},\\"offerArgs\\":{\\"chainName\\":\\"osmosis\\",\\"localDenom\\":\\"ibc/ED07A3391A112B175915CD8FAF43A2DA8E4790EDE12566649D0C2F97716B8518\\",\\"validator\\":{\\"chainId\\":\\"osmosislocal\\",\\"encoding\\":\\"bech32\\",\\"value\\":\\"osmovaloper1qjtcxl86z0zua2egcsz4ncff2gzlcndzs93m43\\"}},\\"proposal\\":{}}}","slots":[]}', type: 'WALLET_SPEND_ACTION' }
2024-08-27T19:13:55.3715489Z 2024-08-27T19:02:13.189Z SwingSet: vat: v32: walletFactory: { wallet: Object [Alleged: SmartWallet self] {}, actionCapData: { body: '#{"method":"executeOffer","offer":{"id":"osmosis-makeAccountsInvitation-1724785332378","invitationSpec":{"callPipe":[["makeAccountsInvitation"]],"instancePath":["autoAutoStakeIt"],"source":"agoricContract"},"offerArgs":{"chainName":"osmosis","localDenom":"ibc/ED07A3391A112B175915CD8FAF43A2DA8E4790EDE12566649D0C2F97716B8518","validator":{"chainId":"osmosislocal","encoding":"bech32","value":"osmovaloper1qjtcxl86z0zua2egcsz4ncff2gzlcndzs93m43"}},"proposal":{}}}', slots: [] } }
2024-08-27T19:13:55.3716541Z 2024-08-27T19:02:13.201Z SwingSet: vat: v32: wallet agoric1zdzzfta2t2w5vvsugwgcrpdh4jskxqy8ny7s86 starting executeOffer osmosis-makeAccountsInvitation-1724785332378

2024-08-27T19:13:55.3729232Z 2024-08-27T19:02:13.987Z SwingSet: vat: v32: wallet agoric1zdzzfta2t2w5vvsugwgcrpdh4jskxqy8ny7s86 osmosis-makeAccountsInvitation-1724785332378 seated
2024-08-27T19:13:55.3732838Z 2024-08-27T19:02:14.599Z SwingSet: vat: v32: wallet agoric1zdzzfta2t2w5vvsugwgcrpdh4jskxqy8ny7s86 offerStatus { id: 'osmosis-makeAccountsInvitation-1724785332378', invitationSpec: { callPipe: [ [ 'makeAccountsInvitation' ] ], instancePath: [ 'autoAutoStakeIt' ], source: 'agoricContract' }, offerArgs: { chainName: 'osmosis', localDenom: 'ibc/ED07A3391A112B175915CD8FAF43A2DA8E4790EDE12566649D0C2F97716B8518', validator: { chainId: 'osmosislocal', encoding: 'bech32', value: 'osmovaloper1qjtcxl86z0zua2egcsz4ncff2gzlcndzs93m43' } }, proposal: {}, numWantsSatisfied: 1 }
2024-08-27T19:13:55.3736799Z 2024-08-27T19:02:14.772Z SwingSet: vat: v32: wallet agoric1zdzzfta2t2w5vvsugwgcrpdh4jskxqy8ny7s86 offerStatus { id: 'osmosis-makeAccountsInvitation-1724785332378', invitationSpec: { callPipe: [ [ 'makeAccountsInvitation' ] ], instancePath: [ 'autoAutoStakeIt' ], source: 'agoricContract' }, numWantsSatisfied: 1, offerArgs: { chainName: 'osmosis', localDenom: 'ibc/ED07A3391A112B175915CD8FAF43A2DA8E4790EDE12566649D0C2F97716B8518', validator: { chainId: 'osmosislocal', encoding: 'bech32', value: 'osmovaloper1qjtcxl86z0zua2egcsz4ncff2gzlcndzs93m43' } }, proposal: {}, payouts: {} }

2024-08-27T19:13:55.3859781Z 2024-08-27T19:02:21.404Z SwingSet: vat: v16: IBC fromBridge { blockHeight: 332, blockTime: 1724785340, channelID: 'channel-3', connectionHops: [ 'connection-0' ], counterparty: { channel_id: 'channel-2', port_id: 'icahost' }, counterpartyVersion: '{"version":"ics27-1","controller_connection_id":"connection-0","host_connection_id":"connection-1","address":"osmo18n0szrn4nwfc87w2dqt7uaf2sz3ml65q9rawcsljld5dftqrfycsdgax6p","encoding":"proto3","tx_type":"sdk_multi_msg"}', event: 'channelOpenAck', portID: 'icacontroller-1', type: 'IBC_EVENT' }
2024-08-27T19:13:55.3863242Z 2024-08-27T19:02:21.576Z SwingSet: vat: v19: ----- IcaAccountKit.2  2 ICA Channel Opened for /ibc-port/icacontroller-1/ordered/{"version":"ics27-1","controller_connection_id":"connection-0","host_connection_id":"connection-1","address":"osmo18n0szrn4nwfc87w2dqt7uaf2sz3ml65q9rawcsljld5dftqrfycsdgax6p","encoding":"proto3","tx_type":"sdk_multi_msg"}/ibc-channel/channel-3 at /ibc-hop/connection-0/ibc-port/icahost/ordered/{"version":"ics27-1","controller_connection_id":"connection-0","host_connection_id":"connection-1","address":"osmo18n0szrn4nwfc87w2dqt7uaf2sz3ml65q9rawcsljld5dftqrfycsdgax6p","encoding":"proto3","tx_type":"sdk_multi_msg"}/ibc-channel/channel-2

This seems to indicate we should have an offer result in ~9 seconds. The default timeout for retryWithCondition is 21 seconds (6x 3500ms), so this is surprising 🤔

Additional relevant context - with #9927, the first call to osmosisChain.makeAccount() will result in a network request to establish an ICQ channel. This will add some additional latency to the overall flow.