ExchangeUnion / xud

Exchange Union Daemon 🔁 ⚡️
https://exchangeunion.com
GNU Affero General Public License v3.0
115 stars 44 forks source link

Tests flakiness #1771

Open sangaman opened 4 years ago

sangaman commented 4 years ago

This issue is for tracking intermittent test failures that cause our tests to be flaky and require manual restarts even when there is no underlying issue in the xud code itself. Since there seem to be quite a few, I figured it would be easier to track them all in one issue. These primarily effect simulation tests but also occasionally our mocha/jest tests.

sangaman commented 4 years ago

Failure at the start of running simulation tests, a new one I haven't seen before.

 === RUN   TestIntegration
2020/07/30 13:23:37 Connext node reachability failure: invalid response at http://172.17.0.1:8080/config: [123 34 115 116 97 116 117 115 67 111 100 101 34 58 52 48 52 44 34 109 101 115 115 97 103 101 34 58 34 67 97 110 110 111 116 32 71 69 84 32 47 99 111 110 102 105 103 34 44 34 101 114 114 111 114 34 58 34 78 111 116 32 70 111 117 110 100 34 125]
exit status 1
FAIL    github.com/ExchangeUnion/xud-simulation 0.083s
~/work/xud/xud/test/simulation/temp/indra ~/work/xud/xud/test/simulation
sangaman commented 4 years ago

EDIT: Should be fixed by #1790.

This happens occasionally with the Mocha tests, I've seen it several times both locally and on GitHub Actions.

   1) P2P Pool Tests
       should update a node on new handshake:

      AssertionError: expected '123.123.123.123' to equal '86.75.30.9'
      + expected - actual

      -123.123.123.123
      +86.75.30.9

      at /home/runner/work/xud/xud/test/integration/Pool.spec.ts:103:49
      at Generator.next (<anonymous>)
      at fulfilled (test/integration/Pool.spec.ts:24:58)

  2) P2P Pool Tests
       reconnect logic
         "before each" hook for "should not reconnect upon shutdown inbound":
     node 022a49046f252821dfcb3a3f7c8e203af97fb2b8223d8b9204380ccdce6910755d at 321.321.321.321:1337 already connected
sangaman commented 4 years ago

EDIT: This should be fixed by #1791.

This one I've seen for a while, it happens intermittently and more often locally. I've tried looking into this a bit before but was stumped.

  1) Parser
       test TCP segmentation/concatenation support
         should parse encrypted Ping SessionInit Ping concatenated and split on byte 15 from each packet beginning:

      AssertionError: expected '"{\\"body\\":{\\"ephemeralPubKey\\":\\"73b236f0-aa26-4391-8b44-57df28796f0b\\",\\"nodePubKey\\":\\"354bf50b-d846-443f-bc7d-8bd708fe74f2\\",\\"nodeState\\":{\\"addresses\\":[{\\"host\\":\\"1.1.1.1\\",\\"port\\":8885},{\\"host\\":\\"2.2.2.2\\",\\"port\\":8885}],\\"connextIdentifier\\":\\"95734182-9ed5-46e5-b993-22d280f210ec\\",\\"lndPubKeys\\":{\\"BTC\\":\\"775106c5-79e5-48bd-984b-52d53a3b604d\\",\\"LTC\\":\\"197262b9-bd75-4082-8e85-46ad3b7ecc5f\\"},\\"lndUris\\":{\\"BTC\\":[\\"\\"],\\"LTC\\":[\\"\\"]},\\"pairs\\":[\\"78f62c27-c4e1-41eb-8ceb-5400c1fa932c\\"],\\"raidenAddress\\":\\"fe024285-1e1d-44ec-bb7c-e5ccf4df0389\\",\\"tokenIdentifiers\\":{\\"BTC\\":\\"bitcoin-testnet\\",\\"LTC\\":\\"litecoin-testnet\\"}},\\"peerPubKey\\":\\"25ee301d-a407-4619-b68c-e8dc2b63f340\\",\\"sign\\":\\"f6eeed07-00d0-43ad-b6a5-d48d686718c4\\",\\"version\\":\\"1.0.0\\"},\\"header\\":{\\"id\\":\\"5efe7f02-d548-11ea-befd-ef7b6a3706c2\\"}}"' to equal '"{\\"header\\":{\\"id\\":\\"5efe7f01-d548-11ea-befd-ef7b6a3706c2\\"}}"'
      + expected - actual

      -"{\"body\":{\"ephemeralPubKey\":\"73b236f0-aa26-4391-8b44-57df28796f0b\",\"nodePubKey\":\"354bf50b-d846-443f-bc7d-8bd708fe74f2\",\"nodeState\":{\"addresses\":[{\"host\":\"1.1.1.1\",\"port\":8885},{\"host\":\"2.2.2.2\",\"port\":8885}],\"connextIdentifier\":\"95734182-9ed5-46e5-b993-22d280f210ec\",\"lndPubKeys\":{\"BTC\":\"775106c5-79e5-48bd-984b-52d53a3b604d\",\"LTC\":\"197262b9-bd75-4082-8e85-46ad3b7ecc5f\"},\"lndUris\":{\"BTC\":[\"\"],\"LTC\":[\"\"]},\"pairs\":[\"78f62c27-c4e1-41eb-8ceb-5400c1fa932c\"],\"raidenAddress\":\"fe024285-1e1d-44ec-bb7c-e5ccf4df0389\",\"tokenIdentifiers\":{\"BTC\":\"bitcoin-testnet\",\"LTC\":\"litecoin-testnet\"}},\"peerPubKey\":\"25ee301d-a407-4619-b68c-e8dc2b63f340\",\"sign\":\"f6eeed07-00d0-43ad-b6a5-d48d686718c4\",\"version\":\"1.0.0\"},\"header\":{\"id\":\"5efe7f02-d548-11ea-befd-ef7b6a3706c2\"}}"
      +"{\"header\":{\"id\":\"5efe7f01-d548-11ea-befd-ef7b6a3706c2\"}}"

      at /home/daniel/Documents/GitHub/xud/test/unit/Parser.spec.ts:52:46
sangaman commented 4 years ago

Another simulation test error in GitHub actions that went away on retry:

 === RUN   TestSecurityUnsettledChannels
2020/08/03 11:52:51 xud: creating network
2020/08/03 11:52:51 ltcd: launching node...
2020/08/03 11:52:52 ltcd: 200 blocks generated
2020/08/03 11:52:52 lnd-ltc: launching network...
2020/08/03 11:53:05 btcd: launching node...
2020/08/03 11:53:05 btcd: 200 blocks generated
2020/08/03 11:53:05 lnd-btc: launching network...
2020/08/03 11:53:16 connext: launching network...
2020/08/03 11:53:20 xud: launching network...
2020/08/03 11:53:42 cannot start xud network: timeout waiting for xud to be ready
exit status 1
FAIL    github.com/ExchangeUnion/xud-simulation 248.021s
~/work/xud/xud/test/simulation/temp/indra ~/work/xud/xud/test/simulation

Note that this same error happens not just on the security simulation tests, but also the integration and instability simulation tests as well.

sangaman commented 4 years ago

EDIT: This was a recently introduced edge case bug and is fixed by #1781.

This jest error I believe I've seen only locally:

 FAIL  test/jest/Pool.spec.ts
  ● P2P Pool › it connects exactly once if two peers attempt connections to each other simultaneously

    read ECONNRESET
kilrau commented 4 years ago

Two different failures observed in https://github.com/ExchangeUnion/xud/pull/1769/checks?check_run_id=927870407 :

ONE

 docker network rm indra indra_cf_tester indra_node_tester indra_test_store 2> /dev/null || true
indra
docker secret rm indra_database_dev 2> /dev/null || true
indra_database_dev
docker volume rm indra_chain_1337 indra_chain_1338 indra_database_dev  2> /dev/null || true
indra_database_dev
docker volume rm `docker volume ls -q -f name=indra_database_test_*` 2> /dev/null || true
rm -rf .chaindata/*
rm -rf .flags/deployed-contracts
~/work/xud/xud/test/simulation
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! xud@1.0.0-beta.5 test:sim:run:integration: `(cd test/simulation && ./docker-run.sh TestIntegration)`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the xud@1.0.0-beta.5 test:sim:run:integration script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/runner/.npm/_logs/2020-07-30T14_17_09_849Z-debug.log
##[error]Process completed with exit code 1.

TWO

The indra stack has been deployed, waiting for the proxy to start responding..
Timed out waiting for proxy to respond..
sangaman commented 4 years ago
    --- FAIL: TestIntegration/p2p_already_connected (0.00s)
        actions.go:132: 
                Error Trace:    actions.go:132
                                            tests-integration.go:160
                                            harnessTest.go:95
                                            xud_test.go:86
                Error:          Received unexpected error:
                                rpc error: code = AlreadyExists desc = node 037c5ed36a34091b37cf2603e7f59bb1ebc670d328ec292d08ecda8dfb0d4ecc53 at 127.0.0.1:40821 already connected
                Test:           TestIntegration/p2p_already_connected
sangaman commented 4 years ago

EDIT: Should be fixed by #1790

Mocha tests:

P2P Pool Tests
    ✓ should open a connection with a peer
    ✓ should close a peer
  actual: {
    ✓ should throw error when connecting to tor node with tor disabled
    message: 'node 0390015423b2466f1d07b09cd0023241a3190de3ff928028e5df50fa3687ed53d4 at 321.321.321.321:1337 already connected',
    1) should update a node on new handshake
    code: '2.1',
    reconnect logic
    stack: undefined
2020-07-31 02:37:41.4141 [P2P] warn: could not open connection to outbound peer (0390015423b2466f1d07b09cd0023241a3190de3ff928028e5df50fa3687ed53d4 (undefined)): node 0390015423b2466f1d07b09cd0023241a3190de3ff928028e5df50fa3687ed53d4 at 321.321.321.321:1337 already connected
      2) "before each" hook for "should not reconnect upon shutdown inbound"
  },

  expected: undefined
  API Service
}
sangaman commented 4 years ago
 --- FAIL: TestSecurityUnsettledChannels (610.89s)
    --- PASS: TestSecurityUnsettledChannels/network_initialization (0.75s)
    --- PASS: TestSecurityUnsettledChannels/maker_shutdown_after_1st_htlc (165.99s)
    --- FAIL: TestSecurityUnsettledChannels/taker_stalling_after_2nd_htlc (395.75s)
        tests-security.go:72: 
                Error Trace:    tests-security.go:72
                                            tests-security.go:495
                                            harnessTest.go:95
                                            xud_test.go:270
                Error:          Not equal: 
                                expected: 1
                                actual  : 0
                Test:           TestSecurityUnsettledChannels/taker_stalling_after_2nd_htlc
FAIL
exit status 1
kilrau commented 4 years ago

Improved quite a bunch, moving to the next project.

sangaman commented 4 years ago

I saw the parser error again, but it seems to be far less common. I'll go back and check and at some point if there's anything I missed from the first time I addressed it.

1) Parser
     test TCP segmentation/concatenation support
       should parse encrypted Ping SessionInit Ping concatenated and split on byte 16 from each packet beginning:

    AssertionError: expected '"{\\"body\\":{\\"ephemeralPubKey\\":\\"3e798f0f-728a-4fdd-b434-fd5199a7de87\\",\\"nodePubKey\\":\\"e757d34c-27cc-4a13-8298-c830f4712d81\\",\\"nodeState\\":{\\"addresses\\":[{\\"host\\":\\"1.1.1.1\\",\\"port\\":8885},{\\"host\\":\\"2.2.2.2\\",\\"port\\":8885}],\\"connextIdentifier\\":\\"d1f1ca10-2c54-4224-8bf8-1f14f930a65e\\",\\"lndPubKeys\\":{\\"BTC\\":\\"73ca86e7-b234-476a-a496-d67871ef7ebd\\",\\"LTC\\":\\"edc6c883-35dc-4f6b-b1d2-2a6aa36dfe47\\"},\\"lndUris\\":{\\"BTC\\":[\\"\\"],\\"LTC\\":[\\"\\"]},\\"pairs\\":[\\"898c3cdb-b43a-404f-90c4-233b642e76d7\\"],\\"tokenIdentifiers\\":{\\"BTC\\":\\"bitcoin-testnet\\",\\"LTC\\":\\"litecoin-testnet\\"}},\\"peerPubKey\\":\\"638f05fb-317b-4b9d-8673-fab1317535f1\\",\\"sign\\":\\"e1660403-9324-4f09-b0c9-9106376f1d99\\",\\"version\\":\\"1.0.0\\"},\\"header\\":{\\"id\\":\\"ceb98102-e231-11ea-9cbe-0de5bfbc72ec\\"}}"' to equal '"{\\"header\\":{\\"id\\":\\"ceb98101-e231-11ea-9cbe-0de5bfbc72ec\\"}}"'
    + expected - actual

    -"{\"body\":{\"ephemeralPubKey\":\"3e798f0f-728a-4fdd-b434-fd5199a7de87\",\"nodePubKey\":\"e757d34c-27cc-4a13-8298-c830f4712d81\",\"nodeState\":{\"addresses\":[{\"host\":\"1.1.1.1\",\"port\":8885},{\"host\":\"2.2.2.2\",\"port\":8885}],\"connextIdentifier\":\"d1f1ca10-2c54-4224-8bf8-1f14f930a65e\",\"lndPubKeys\":{\"BTC\":\"73ca86e7-b234-476a-a496-d67871ef7ebd\",\"LTC\":\"edc6c883-35dc-4f6b-b1d2-2a6aa36dfe47\"},\"lndUris\":{\"BTC\":[\"\"],\"LTC\":[\"\"]},\"pairs\":[\"898c3cdb-b43a-404f-90c4-233b642e76d7\"],\"tokenIdentifiers\":{\"BTC\":\"bitcoin-testnet\",\"LTC\":\"litecoin-testnet\"}},\"peerPubKey\":\"638f05fb-317b-4b9d-8673-fab1317535f1\",\"sign\":\"e1660403-9324-4f09-b0c9-9106376f1d99\",\"version\":\"1.0.0\"},\"header\":{\"id\":\"ceb98102-e231-11ea-9cbe-0de5bfbc72ec\"}}"
    +"{\"header\":{\"id\":\"ceb98101-e231-11ea-9cbe-0de5bfbc72ec\"}}"
sangaman commented 4 years ago

Still seeing this one, I'll have to revisit it:

  1) Parser
       test TCP segmentation/concatenation support
         should parse encrypted Ping SessionInit Ping concatenated and split on byte 17 from each packet beginning:

      AssertionError: expected '"{\\"body\\":{\\"ephemeralPubKey\\":\\"a2d3794f-428f-4211-90e6-28f5131d714e\\",\\"nodePubKey\\":\\"69e9716a-38b4-4558-8f50-312540e4bbfa\\",\\"nodeState\\":{\\"addresses\\":[{\\"host\\":\\"1.1.1.1\\",\\"port\\":8885},{\\"host\\":\\"2.2.2.2\\",\\"port\\":8885}],\\"connextIdentifier\\":\\"3992aa06-aae6-4771-9f0c-20963f17a30b\\",\\"lndPubKeys\\":{\\"BTC\\":\\"2748ce07-9d2f-4e00-97ed-9758257251bb\\",\\"LTC\\":\\"30114f06-2e60-4c58-a46d-52a999af3bf0\\"},\\"lndUris\\":{\\"BTC\\":[\\"\\"],\\"LTC\\":[\\"\\"]},\\"pairs\\":[\\"d989343d-5272-421a-b07b-4bca07fe923c\\"],\\"tokenIdentifiers\\":{\\"BTC\\":\\"bitcoin-testnet\\",\\"LTC\\":\\"litecoin-testnet\\"}},\\"peerPubKey\\":\\"671d3ba0-7112-4d39-af78-9ae7adda121f\\",\\"sign\\":\\"5d866ac5-f840-4753-a255-a45e27f3cc08\\",\\"version\\":\\"1.0.0\\"},\\"header\\":{\\"id\\":\\"c1cb3ed9-e631-11ea-8329-1d1f40df5a05\\"}}"' to equal '"{\\"header\\":{\\"id\\":\\"c1cb3ed8-e631-11ea-8329-1d1f40df5a05\\"}}"'
      + expected - actual

      -"{\"body\":{\"ephemeralPubKey\":\"a2d3794f-428f-4211-90e6-28f5131d714e\",\"nodePubKey\":\"69e9716a-38b4-4558-8f50-312540e4bbfa\",\"nodeState\":{\"addresses\":[{\"host\":\"1.1.1.1\",\"port\":8885},{\"host\":\"2.2.2.2\",\"port\":8885}],\"connextIdentifier\":\"3992aa06-aae6-4771-9f0c-20963f17a30b\",\"lndPubKeys\":{\"BTC\":\"2748ce07-9d2f-4e00-97ed-9758257251bb\",\"LTC\":\"30114f06-2e60-4c58-a46d-52a999af3bf0\"},\"lndUris\":{\"BTC\":[\"\"],\"LTC\":[\"\"]},\"pairs\":[\"d989343d-5272-421a-b07b-4bca07fe923c\"],\"tokenIdentifiers\":{\"BTC\":\"bitcoin-testnet\",\"LTC\":\"litecoin-testnet\"}},\"peerPubKey\":\"671d3ba0-7112-4d39-af78-9ae7adda121f\",\"sign\":\"5d866ac5-f840-4753-a255-a45e27f3cc08\",\"version\":\"1.0.0\"},\"header\":{\"id\":\"c1cb3ed9-e631-11ea-8329-1d1f40df5a05\"}}"
      +"{\"header\":{\"id\":\"c1cb3ed8-e631-11ea-8329-1d1f40df5a05\"}}"
sangaman commented 4 years ago

Ran into this twice while trying to run npm version (which runs through tests).

--- PASS: TestSecurity (168.59s)
    --- PASS: TestSecurity/network_initialization (0.52s)
    --- PASS: TestSecurity/taker_stalling_on_swapAccepted (15.62s)
    --- PASS: TestSecurity/maker_stalling_after_1st_htlc (100.68s)
    --- PASS: TestSecurity/taker_stalling_after_swap_succeeded (6.53s)
=== RUN   TestSecurityUnsettledChannels
2020/08/24 18:15:45 xud: creating network
2020/08/24 18:15:45 ltcd: launching node...
2020/08/24 18:15:45 ltcd: 200 blocks generated
2020/08/24 18:15:45 lnd-ltc: launching network...
2020/08/24 18:15:54 btcd: launching node...
2020/08/24 18:15:54 btcd: 200 blocks generated
2020/08/24 18:15:54 lnd-btc: launching network...
2020/08/24 18:16:01 connext: launching network...
2020/08/24 18:16:03 xud: launching network...
2020/08/24 18:16:08 Running 4 unsettled-channels security tests
=== RUN   TestSecurityUnsettledChannels/network_initialization
=== RUN   TestSecurityUnsettledChannels/maker_shutdown_after_1st_htlc
=== RUN   TestSecurityUnsettledChannels/taker_stalling_after_2nd_htlc
=== RUN   TestSecurityUnsettledChannels/taker_shutdown_after_2nd_htlc
2020/08/24 18:23:47 xud process (33-Alice) did not shutdown gracefully. process (13584) killed

Not sure if this is something you might have an idea about @LePremierHomme, why the taker_shutdown_after_2nd_htlc test doesn't shutdown gracefully.

sangaman commented 4 years ago

I'm seeing this cause failures at various points in the simulation tests, it may be related to the 2048 bit TLS cert change in #1692 although I've also seen it happen without that commit.

Failed to read TLS certificate: credentials: failed to append certificates, remaining tries: 4
Failed to read TLS certificate: credentials: failed to append certificates, remaining tries: 3
Failed to read TLS certificate: credentials: failed to append certificates, remaining tries: 2
Failed to read TLS certificate: credentials: failed to append certificates, remaining tries: 1
Failed to read TLS certificate: credentials: failed to append certificates, remaining tries: 0

Edit: This appears to be fixed by the reversion in #1841.

sangaman commented 3 years ago

Here's a new one I've seen a couple of times only recently, both locally and in GitHub Actions:

  1) P2P Pool Tests
       "after all" hook for "should update a node on new handshake":
     Error: Timeout of 10000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/home/daniel/Documents/GitHub/xud/test/integration/Pool.spec.ts)
      at listOnTimeout (internal/timers.js:549:17)
      at processTimers (internal/timers.js:492:7)

Edit: Still seeing this after the fix to the stall timer.

sangaman commented 3 years ago

Looks like there may be some jest tests that don't always terminate properly, at least in GitHub Actions. https://github.com/ExchangeUnion/xud/pull/1938#discussion_r534505531

ghost commented 3 years ago

This error is becoming increasingly annoying:

   1) P2P Pool Tests
       "after all" hook for "should update a node on new handshake":
     Error: Timeout of 10000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/home/runner/work/xud/xud/test/integration/Pool.spec.ts)
      at listOnTimeout (internal/timers.js:554:17)
      at processTimers (internal/timers.js:497:7)