fedimint / fedimint

Federated E-Cash Mint
https://fedimint.org/
MIT License
536 stars 209 forks source link

peg_outs_must_wait_for_available_utxos flake `Unexpected state WaitingForConfirmations` #5126

Closed dpc closed 2 weeks ago

dpc commented 3 weeks ago
2024-04-26T21:02:11.020873Z  INFO fedimint_wallet_tests: Peg-in transaction mined height=11 tx=Transaction { version: 0, lock_time: PackedLockTime(0), input: [], output: [TxOut { value: 6000, script_pubkey: Script(OP_0 OP_PUSHBYTES_32 fd4e0c4c7ea6e4185036bea5f45313c6b2f485fac75111adb33e0e4d0e47866f) }] }
2024-04-26T21:02:11.022853Z DEBUG task{name="state machine executor"}:state_machine_transition{operation_id=89769494695e2d60bbf8247117896c6ba2260b6d0741c6877722eb9aa1e9b356}: timing: Operation timing name=autocommit - commit_tx duration_ms=0
thread 'peg_outs_must_wait_for_available_utxos' panicked at /build/source/modules/fedimint-wallet-client/src/lib.rs:502:29:
Unexpected state WaitingForConfirmations(WaitingForConfirmationsDepositState { tweak_key: KeyPair(#78eaae26423b7463), btc_transaction: Transaction { version: 0, lock_time: PackedLockTime(0), input: [], output: [TxOut { value: 6000, script_pubkey: Script(OP_0 OP_PUSHBYTES_32 fd4e0c4c7ea6e4185036bea5f45313c6b2f485fac75111adb33e0e4d0e47866f) }] }, out_idx: 0 })
stack backtrace:
   0: rust_begin_unwind
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/std/src/panicking.rs:647:5
   1: core::panicking::panic_fmt
             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library/core/src/panicking.rs:72:14
   2: {async_block#0}
             at /build/source/modules/fedimint-wallet-client/src/lib.rs:502:29
   3: <async_stream::async_stream::AsyncStream<fedimint_wallet_client::DepositState, <fedimint_wallet_client::WalletClientModule>::subscribe_deposit_updates::{closure#0}::{closure#1}::{closure#0}> as futures_core::stream::Stream>::poll_next
             at /nix/store/k52qq1wv145wzmim6x8hjbhn5iqg3rb5-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/async-stream-0.3.5/src/async_stream.rs:56:13
   4: poll_next<alloc::boxed::Box<async_stream::async_stream::AsyncStream<fedimint_wallet_client::DepositState, fedimint_wallet_client::{impl#4}::subscribe_deposit_updates::{async_fn#0}::{closure#1}::{async_block_env#0}>, alloc::alloc::Global>>
             at /nix/store/k52qq1wv145wzmim6x8hjbhn5iqg3rb5-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/futures-core-0.3.30/src/stream.rs:120:9
   5: poll_next_unpin<core::pin::Pin<alloc::boxed::Box<async_stream::async_stream::AsyncStream<fedimint_wallet_client::DepositState, fedimint_wallet_client::{impl#4}::subscribe_deposit_updates::{async_fn#0}::{closure#1}::{async_block_env#0}>, alloc::alloc::Global>>>
             at /nix/store/k52qq1wv145wzmim6x8hjbhn5iqg3rb5-vendor-cargo-deps/c19b7c6f923b580ac259164a89f2577984ad5ab09ee9d583b888f934adbbe8d0/futures-util-0.3.30/src/stream/stream/mod.rs:1638:9
   6: 

https://github.com/fedimint/fedimint/actions/runs/8854215897/job/24316806541

dpc commented 3 weeks ago

@elsirion

image

If I'm reading it right, the only way 502 could assert this way, would be if WaitingForConfirmations was emitted twice?

bradleystachurski commented 2 weeks ago

Another flake, this time with fedimint-wallet-tests::fedimint_wallet_tests peg_out_fail_refund

fedimint-workspace-lcov-ci> Unexpected state WaitingForConfirmations(WaitingForConfirmationsDepositState { tweak_key: KeyPair(#5fae9b66639b1d28), btc_transaction: Transaction { version: 0, lock_time: PackedLockTime(0), input: [], output: [TxOut { value: 6000, script_pubkey: Script(OP_0 OP_PUSHBYTES_32 594eeb6bf5ed5da6204db6b83d5157c6b91f9ea39f73e9243618e1b8e86b3821) }] }, out_idx: 0 })

https://github.com/fedimint/fedimint/actions/runs/8872810356/job/24357681665#step:8:310

dpc commented 2 weeks ago

https://github.com/fedimint/fedimint/actions/runs/8875449579/job/24364910907?pr=5149

dpc commented 2 weeks ago
        // It's important to start the subscription first and then query the database to
        // not lose any transitions in the meantime.
        let new_transitions =
            self.subscribe_all_operations()
                .await
                .filter_map(move |state: S| async move {
                    if state.operation_id() == operation_id {
                        trace!(%operation_id, ?state, "Received state transition notification");
                        Some(state)
                    } else {
                        None
                    }
                });

Ah right. We can return states from the DB and then the subscribed ones, and there can overlap.

The question is - what do we do about it? The whole thing is kind of a clumsy construction, IMO.

We could de-duplicate, but in theory overlap can be arbitrarily long.

We could make client worry about it, but that's error prone and crappy UX.

We could use db as the ultimate source and use notifications only as a wakup, which seems most reliable, but will be a larger refactor, possibly part of #5006 .

@elsirion Thoughts?

elsirion commented 2 weeks ago

Making the client worry about it only reduces DX, not UX since modules implement the translation between these low-level state machine notifications and high-level operation states. These functions should probably ignore duplicate states :/

Edit: and yes, once we re-work the reactor we should use the DB as source of truth.

dpc commented 2 weeks ago

OK, I think I've found a proper way to dedup all instances there: #5173. It does come from an assamption that the same exact state can't happen twice.

@elsirion

dpc commented 2 weeks ago

Most probably should not happen after #5173 landed.