ava-labs / avalanchego

Go implementation of an Avalanche node.
https://avax.network
BSD 3-Clause "New" or "Revised" License
2.12k stars 670 forks source link

e2e flake: P-Chain workflow test times out waiting for status of delegator addition #2070

Open marun opened 1 year ago

marun commented 1 year ago

Symptom

The P-Chain workflow test attempted to add a delegator for a new node, and the transaction was issued, but the test timed out after 1h waiting for the transaction status (due to not having a timed context, which https://github.com/ava-labs/avalanchego/pull/2069 fixes). It appears that the transaction was dropped by a node the test was not talking to (NodeID-CabYGC8s6XzAXCNdrpzcmMb4x6XcZXTAu - due to processing taking long enough that the delegator start time was no longer within synchrony bound), but for some reason that transaction status was never propagated to the node that the test was talking to (NodeID-LP4G9QEVsYYTtLuqtJ3Q6AzKJ6fNLDhME).

P.log on the node the test was talking to (NodeID-LP4G9QEVsYYTtLuqtJ3Q6AzKJ6fNLDhME) indicating that the transaction (tx id sRZXqcwU1FBt8GF48614VXQKA8BBk9uSwJPTjPAL8BTEdq8oU) was dropped:

[09-22|06:59:23.909] DEBUG <P Chain> builder/network.go:165 gossiping tx {"txID": "sRZXqcwU1FBt8GF48614VXQKA8BBk9uSwJPTjPAL8BTEdq8oU"}
[09-22|06:59:23.918] DEBUG <P Chain> builder/builder.go:336 dropping message to consensus engine
[09-22|06:59:23.920] DEBUG <P Chain> handler/handler.go:795 forwarding async message to consensus {"nodeID": "NodeID-CabYGC8s6XzAXCNdrpzcmMb4x6XcZXTAu", "messageOp": "app_gossip"}
[09-22|06:59:23.920] DEBUG <P Chain> builder/network.go:97 called AppGossip message handler {"nodeID": "NodeID-CabYGC8s6XzAXCNdrpzcmMb4x6XcZXTAu", "messageLen": 545}
[09-22|06:59:23.920] DEBUG <P Chain> platformvm/service.go:2185 API called {"service": "platform", "method": "getTxStatus"}
[09-22|06:59:23.922] DEBUG <P Chain> handler/handler.go:812 finished handling async message {"messageOp": "app_gossip"}
[09-22|06:59:24.000] DEBUG <P Chain> handler/handler.go:895 forwarding chan message to consensus {"messageOp": "notify"}
[09-22|06:59:24.000] DEBUG <P Chain> builder/builder.go:171 starting to attempt to build a block
[09-22|06:59:24.048] INFO <P Chain> proposervm/block.go:285 built block {"blkID": "aEtVBGJybwWwjhqqMaM6WZBtxVzZEXn2SS4XuRbLKs9BYngtP", "innerBlkID": "2Ax3zZD16XFfXAz5k4R5jfC6JDQHVrPmaMLiySdbhtdb5Guf1", "height": 3, "parentTimestamp": "[09-22|06:59:23.000]", "blockTimestamp": "[09-22|06:59:24.000]"}
[09-22|06:59:24.074] DEBUG <P Chain> handler/handler.go:459 forwarding sync message to consensus {"nodeID": "NodeID-LTtkyyvQjvmrCw95XhcRQ6Gj3zkgJwDYt", "messageOp": "pull_query"}
[09-22|06:59:24.075] DEBUG <P Chain> proposervm/block.go:159 verified post-fork block {"blkID": "aEtVBGJybwWwjhqqMaM6WZBtxVzZEXn2SS4XuRbLKs9BYngtP", "parentTimestamp": "[09-22|06:59:23.000]", "minDelay": "0s", "blockTimestamp": "[09-22|06:59:24.000]"}
[09-22|06:59:24.076] DEBUG <P Chain> snowman/transitive.go:884 block verification failed {"error": "failed to get shared memory: not found"}
[09-22|06:59:24.076] DEBUG <P Chain> handler/handler.go:913 finished handling chan message {"messageOp": "notify"}
[09-22|06:59:24.076] DEBUG <P Chain> builder/builder.go:375 no pending txs to issue into a block

P.log on the other initial validator in the network (NodeID-CabYGC8s6XzAXCNdrpzcmMb4x6XcZXTAu) indicating that it took long enough to attempt to build a block that the delegator transaction was no longer within synchrony bound and the transaction was dropped:

[09-22|06:59:23.918] DEBUG <P Chain> builder/network.go:165 gossiping tx {"txID": "sRZXqcwU1FBt8GF48614VXQKA8BBk9uSwJPTjPAL8BTEdq8oU"}
[09-22|06:59:23.921] DEBUG <P Chain> handler/handler.go:812 finished handling async message {"messageOp": "app_gossip"}
[09-22|06:59:24.069] DEBUG <P Chain> handler/handler.go:459 forwarding sync message to consensus {"nodeID": "NodeID-LTtkyyvQjvmrCw95XhcRQ6Gj3zkgJwDYt", "messageOp": "pull_query"}
[09-22|06:59:24.069] DEBUG <P Chain> handler/handler.go:480 finished handling sync message {"messageOp": "pull_query"}
<snip>
[09-22|06:59:31.736] DEBUG <P Chain> handler/handler.go:459 forwarding sync message to consensus {"nodeID": "NodeID-LTtkyyvQjvmrCw95XhcRQ6Gj3zkgJwDYt", "messageOp": "put"}
[09-22|06:59:31.737] DEBUG <P Chain> handler/handler.go:480 finished handling sync message {"messageOp": "put"}
[09-22|06:59:33.000] DEBUG <P Chain> handler/handler.go:895 forwarding chan message to consensus {"messageOp": "notify"}
[09-22|06:59:33.000] DEBUG <P Chain> builder/builder.go:171 starting to attempt to build a block
[09-22|06:59:33.001] DEBUG <P Chain> builder/builder.go:270 dropping tx {"txID": "sRZXqcwU1FBt8GF48614VXQKA8BBk9uSwJPTjPAL8BTEdq8oU", "error": "synchrony bound (2023-09-22 06:59:43.00106805 +0000 UTC m=+219.202400158) is later than staker start time (2023-09-22 06:59:43 +0000 UTC)"}
[09-22|06:59:33.001] DEBUG <P Chain> builder/builder.go:375 no pending txs to issue into a block
[09-22|06:59:33.001] DEBUG <P Chain> snowman/transitive.go:478 failed building block {"error": "no pending blocks"}

Sample logs

• [TIMEDOUT] [3457.547 seconds]
[X-Chain] [Interchain Workflow]
/home/runner/work/avalanchego/avalanchego/tests/e2e/describe.go:13
  [It] should ensure that funds can be transferred from the P-Chain to the X-Chain and the C-Chain
  /home/runner/work/avalanchego/avalanchego/tests/e2e/p/interchain_workflow.go:42

  Begin Captured GinkgoWriter Output >>
    STEP: checking that the network has a compatible minimum stake duration 09/22/23 06:59:16.902
    STEP: creating wallet with a funded key to send from and recipient key to deliver to 09/22/23 06:59:16.902
     allocated funded key(s): [PrivateKey-2NryhTbd2Foz6uHJ1MRGca2LEBBbaMjRfUWQmTAwHYBtKavMVs]
     targeting node NodeID-LP4G9QEVsYYTtLuqtJ3Q6AzKJ6fNLDhME with URI: http://127.0.0.1:3[378](https://github.com/ava-labs/avalanchego/actions/runs/6271079928/job/17030035461#step:5:379)3
     initializing a new wallet for node NodeID-LP4G9QEVsYYTtLuqtJ3Q6AzKJ6fNLDhME with URI: http://127.0.0.1:33783 
    STEP: defining common configuration 09/22/23 06:59:16.923
    STEP: adding new node and waiting for it to report healthy 09/22/23 06:59:16.923
    Started ephemeral node "NodeID-LTtkyyvQjvmrCw95XhcRQ6Gj3zkgJwDYt"
    STEP: retrieving new node's id and pop 09/22/23 06:59:23.686
    STEP: adding the new node as a validator 09/22/23 06:59:23.687
     issued transaction with ID: 2JLH4ok2VeR1HZoaUPJuZvK3dnq7iWwVrwo9unhx3Nb7y9L2qd
    STEP: adding a delegator to the new node 09/22/23 06:59:23.901
     issued transaction with ID: sRZXqcwU1FBt8GF48614VXQKA8BBk9uSwJPTjPAL8BTEdq8oU
    Shutting down ephemeral node NodeID-LTtkyyvQjvmrCw95XhcRQ6Gj3zkgJwDYt
    ephemeral node "NodeID-LTtkyyvQjvmrCw95XhcRQ6Gj3zkgJwDYt" exited
  << End Captured GinkgoWriter Output

  Timedout
  In [It] at: /home/runner/work/avalanchego/avalanchego/tests/e2e/p/interchain_workflow.go:42

  This is the Progress Report generated when the timeout occurred:
    [X-Chain] [Interchain Workflow] should ensure that funds can be transferred from the P-Chain to the X-Chain and the C-Chain (Spec Runtime: 57m37.496s)
      /home/runner/work/avalanchego/avalanchego/tests/e2e/p/interchain_workflow.go:42
      In [It] (Node Runtime: 57m37.496s)
        /home/runner/work/avalanchego/avalanchego/tests/e2e/p/interchain_workflow.go:42
        At [By Step] adding a delegator to the new node (Step Runtime: 57m30.493s)
          /home/runner/work/avalanchego/avalanchego/tests/e2e/p/interchain_workflow.go:139

        Spec Goroutine
        goroutine 97 [select]
          github.com/ava-labs/avalanchego/vms/platformvm.(*client).AwaitTxDecided(0xc0025e8e20?, {0x1bf49c0, 0xc000128000}, {0x72, 0x7e, 0x1d, 0xd4, 0x42, 0x34, 0xf6, ...}, ...)
            /home/runner/work/avalanchego/avalanchego/vms/platformvm/client.go:760
          github.com/ava-labs/avalanchego/wallet/chain/p.(*wallet).IssueTx(0xc000dde8c0, 0xc000f77020, {0xc00011b838, 0x1, 0xc002[402](https://github.com/ava-labs/avalanchego/actions/runs/6271079928/job/17030035461#step:5:403)b40?})
            /home/runner/work/avalanchego/avalanchego/wallet/chain/p/wallet.go:501
          github.com/ava-labs/avalanchego/wallet/chain/p.(*wallet).IssueUnsignedTx(0xc000dde8c0, {0x1bfb970, 0xc002402b40}, {0xc00011b838?, 0x1, 0x1})
            /home/runner/work/avalanchego/avalanchego/wallet/chain/p/wallet.go:479
          github.com/ava-labs/avalanchego/wallet/chain/p.(*wallet).IssueAddPermissionlessDelegatorTx(0xc000dde8c0, 0x7f7a76547108?, {0xb1, 0xa1, 0xf4, 0x58, 0xa7, 0xdd, 0xe3, 0xcc, ...}, ...)
            /home/runner/work/avalanchego/avalanchego/wallet/chain/p/wallet.go:465
          github.com/ava-labs/avalanchego/wallet/chain/p.(*walletWithOptions).IssueAddPermissionlessDelegatorTx(0xc0011e2300, 0x0?, {0xb1, 0xa1, 0xf4, 0x58, 0xa7, 0xdd, 0xe3, 0xcc, ...}, ...)
            /home/runner/work/avalanchego/avalanchego/wallet/chain/p/wallet_with_options.go:213
        > github.com/ava-labs/avalanchego/tests/e2e/p.glob..func1.1.3()
            /home/runner/work/avalanchego/avalanchego/tests/e2e/p/interchain_workflow.go:149
              | require.NoError(err)
              | 
              > _, err = pWallet.IssueAddPermissionlessDelegatorTx(
              |     &txs.SubnetValidator{
              |         Validator: txs.Validator{
          github.com/onsi/ginkgo/v2.By({0x18e9b40, 0x22}, {0xc00058bc38, 0x1, 0x0?})
            /home/runner/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.4.0/core_dsl.go:535
        > github.com/ava-labs/avalanchego/tests/e2e/p.glob..func1.1()
            /home/runner/work/avalanchego/avalanchego/tests/e2e/p/interchain_workflow.go:139
              | })
              | 
              > ginkgo.By("adding a delegator to the new node", func() {
              |     startTime := time.Now().Add(e2e.DefaultValidatorStartTimeDiff)
              |     // Delegation duration doesn't actually matter to this
          github.com/onsi/ginkgo/v2/internal.extractBodyFunction.func3({0xc0001b2180, 0xc000045940})
            /home/runner/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.4.0/internal/node.go:449
          github.com/onsi/ginkgo/v2/internal.(*Suite).runNode.func2()
            /home/runner/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.4.0/internal/suite.go:750
          github.com/onsi/ginkgo/v2/internal.(*Suite).runNode
            /home/runner/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.4.0/internal/suite.go:738

Failing jobs

Linked jobs include a testnet-data artifact containing the network data for the run.

marun commented 1 year ago

This flake may represent an opportunity to improve the UX of transaction issuance: ideally, the node that issues a transaction should be capable of reporting its failure.

github-actions[bot] commented 11 months ago

This issue has become stale because it has been open 60 days with no activity. Adding the lifecycle/frozen label will cause this issue to ignore lifecycle events.