get10101 / itchysats

CFDs on Bitcoin.
https://itchysats.network
MIT License
61 stars 18 forks source link

Rollover tests are flaky #2825

Closed klochowicz closed 1 year ago

klochowicz commented 2 years ago

Build failed:

Originally posted by @bors[bot] in https://github.com/itchysats/itchysats/issues/2819#issuecomment-1232642331

This test requires frequent retries on CI - we should investigate whether our code is wrong or the test should be improved

luckysori commented 2 years ago

It's interesting that it fails during the setup of the test (during a rollover). Intuitively it seems like this could happen for other tests in which we roll over.

da-kami commented 2 years ago

If we assume that it is not related to this specific test then this ticket is s duplicate of #2815

klochowicz commented 2 years ago

If we assume that it is not related to this specific test then this ticket is s duplicate of #2815

It's unrelated to that one - issue posted by @holzeis shows inability to install global subscriber (likely otel-tests related). I haven't seen that one on the CI yet

klochowicz commented 2 years ago

It's interesting that it fails during the setup of the test (during a rollover). Intuitively it seems like this could happen for other tests in which we roll over.

I can confirm that I was able to see this error in a different rollover test locally! changed the description to "rollover tests".

failures:
    liquidation::given_rollover_when_oracle_attests_long_liquidation_price_can_liquidate
    rollover::given_rollover_completed_when_taker_fails_rollover_can_retry
klochowicz commented 2 years ago

I have managed to get more of relevant logs:

TL;DR: Msg1 one for the rollover does not seem to arrive to the other party, and the rollover fails. Curiously, I can only see this behaviour under load (when running the whole test suite) - not when the test runs on its own.

2022-09-01T09:45:28.703969Z  INFO xtra_libp2p_rollover::current::taker: Rollover proposal got accepted order_id=71275183-8880-4515-9bee-bac4d0d99b9f
2022-09-01T09:47:07.964784Z ERROR tokio_extras::time: Future timed out err=deadline has elapsed
2022-09-01T09:47:08.007924Z  WARN fallible task handle_error{err=Expected Msg1 within 60 seconds}: model::cfd: order_id=71275183-8880-4515-9bee-bac4d0d99b9f peer_id=Some(12D3KooWPFiN2wPNUakjsGwxz16yRWXewwEmD43rMWN85ptA9P1c) Issuing event: RolloverFailed with: Expected Msg1 within 60 seconds: deadline has elapsed
2022-09-01T09:47:08.072740Z  WARN fallible task handle_error{err=Empty stream instead of Msg2}: model::cfd: order_id=71275183-8880-4515-9bee-bac4d0d99b9f peer_id=Some(12D3KooWSqZL6TmAx4pHFB2fS1id1yPXeVRF6y386QjrAP5JEhoy) Issuing event: RolloverFailed with: Empty stream instead of Msg2
[daemon-tests/src/rollover.rs:64] &maker_cfd.state = Open
[daemon-tests/src/rollover.rs:64] &taker_cfd.state = Open
thread 'liquidation::given_rollover_when_oracle_attests_long_liquidation_price_can_liquidate' panicked at 'assertion failed: `(left != right)`
  left: `01e6742ffa6c2e7714b8b787ab7dfb2e55439729484a019399f2320efa751a9b`,
 right: `01e6742ffa6c2e7714b8b787ab7dfb2e55439729484a019399f2320efa751a9b`: The commit_txid of the taker should have changed after the rollover', /Users/mariusz/coblox/source/itchysats/daemon-tests/src/rollover.rs:66:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'liquidation::given_rollover_when_oracle_attests_long_liquidation_price_can_liquidate' panicked at 'Any { .. }', daemon-tests/tests/main/liquidation.rs:52:1
Restioson commented 2 years ago

Msg1 one for the rollover does not seem to arrive to the other party, and the rollover fails

How is the Msg1 sent in the tests?

klochowicz commented 2 years ago

Msg1 one for the rollover does not seem to arrive to the other party, and the rollover fails

How is the Msg1 sent in the tests?

I'm afraid I don't understand the question - it's part of the rollover protocol, and we trigger it from all actor tests

Restioson commented 2 years ago

I'm mostly wondering the mechanism by which it is sent. It is my understanding that in the app itself, libp2p is used to send Msg1 over the network. Is this still the case for the tests, or is it mocked somehow?

klochowicz commented 2 years ago

Yes, it's proper libp2p - we choose ports at random at do it expertly same as in prod. We mock wallet, Oracle, quotes etc

da-kami commented 2 years ago

I recorded https://github.com/itchysats/itchysats/issues/3027 as separate issue which is related to this ticket, but it is a distinct problem so I opened a separate ticket for it :)

bonomat commented 1 year ago

This project is unmaintained