fedimint / fedimint

Federated E-Cash Mint
https://fedimint.org/
MIT License
536 stars 209 forks source link

Flaky load test(?) #4917

Open dpc opened 1 month ago

dpc commented 1 month ago

https://github.com/fedimint/fedimint/actions/runs/8636342617/job/23676033584?pr=4916

 00:00:29 1 gateway_LND_pay_invoice_success: avg 2.414535004s, median 2.414535004s, max 2.414535004s, min 2.414535004s
00:00:29 1 gateway_pay_invoice_success: avg 2.414535004s, median 2.414535004s, max 2.414535004s, min 2.414535004s
00:00:29 2 reissue_notes: avg 1.149007941s, median 1.189280977s, max 1.189280977s, min 1.108734905s
00:00:33 Error: command: fedimint-load-test-tool --archive-dir /tmp/nix-shell.XK2jQI/load_test_tool_test-VWib/devimint-122145-233/load-test-temp --users 1 load-test --notes-per-user 1 --generate-invoice-with ln-cli
00:00:33 
00:00:33 Caused by:
00:00:33     exit status: 1
00:00:33     stdout:
00:00:33     
00:00:33     stderr:
00:00:33     2024-04-10T19:02:58.126424Z  INFO fedimint_load_test_tool: No invite code provided and failed to get one with 'command: fedimint-cli --data-dir /tmp/nix-shell.XK2jQI/load_test_tool_test-VWib/devimint-122145-233/clients/default-0 invite-code' error, will try to proceed without one...
00:00:33     2024-04-10T19:02:58.286426Z  INFO fedimint_load_test_tool: Current balance 0 msat on coordinator not enough, trying to get 1024 msat more through fedimint-cli
00:00:33     2024-04-10T19:02:59.150205Z  INFO fedimint_load_test_tool: Reminting 1 notes of denomination 1024 msat for 1 users, 1 notes per user (this may take a while if the number of users/notes is high)
00:00:33     2024-04-10T19:02:59.758072Z  INFO fedimint_load_test_tool: Note summary:
00:00:33     2024-04-10T19:02:59.758406Z  INFO fedimint_load_test_tool: 512 msat: 1
00:00:33     Error: Insufficient balance: requested 1024 msat but only 512 msat available
00:00:33     
00:00:33     Stack backtrace:
00:00:33        0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
00:00:33                  at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.81/src/error.rs:565:25
00:00:33        1: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
00:00:33                  at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/result.rs:1959:27
00:00:33        2: {async_block#0}<fedimint_mint_client::SpendableNote, futures_util::stream::stream::map::Map<core::pin::Pin<alloc::boxed::Box<(dyn futures_core::stream::Stream<Item=(fedimint_mint_client::client_db::NoteKey, fedimint_mint_client::SpendableNote)> + core::marker::Send), alloc::alloc::Global>>, fedimint_mint_client::{impl#10}::select_notes::{async_fn#0}::{closure_env#0}<fedimint_mint_client::SelectNotesWithAtleastAmount>>>
00:00:33                  at ./modules/fedimint-mint-client/src/lib.rs:1476:12
00:00:33        3: poll<alloc::boxed::Box<(dyn core::future::future::Future<Output=core::result::Result<fedimint_core::tiered_multi::TieredMulti<fedimint_mint_client::SpendableNote>, anyhow::Error>> + core::marker::Send), alloc::alloc::Global>>
00:00:33                  at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/future/future.rs:124:9
00:00:33        4: {async_fn#0}<fedimint_mint_client::SelectNotesWithAtleastAmount>
00:00:33                  at ./modules/fedimint-mint-client/src/lib.rs:1009:14
00:00:33        5: {async_fn#0}<fedimint_mint_client::SelectNotesWithAtleastAmount>
00:00:33                  at ./modules/fedimint-mint-client/src/lib.rs:940:93
00:00:33        6: {async_block#0}<(), fedimint_mint_client::SelectNotesWithAtleastAmount>
00:00:33                  at ./modules/fedimint-mint-client/src/lib.rs:1271:30
00:00:33        7: poll<alloc::boxed::Box<(dyn core::future::future::Future<Output=core::result::Result<(fedimint_core::core::OperationId, fedimint_mint_client::OOBNotes), anyhow::Error>> + core::marker::Send), alloc::alloc::Global>>
dpc commented 1 month ago

Can't look into it RN, might be immediately obvious to @douglaz .

dpc commented 1 month ago

https://github.com/fedimint/fedimint/actions/runs/8637623379/job/23680228799

douglaz commented 1 month ago

Something very weird is happening here. Created https://github.com/fedimint/fedimint/pull/4925 to help.

Looks like more a bug somewhere than a flaky test.

dpc commented 1 month ago

Even better - a test catching a bug. :rocket: :)

dpc commented 1 month ago

Test: load_test_tool_test FM: current CLI: current GW: v0.2.1 CI failures: https://github.com/fedimint/fedimint/actions/runs/8727469044/job/23945038328

00:00:38     Error: Insufficient balance: requested 1024 msat but only 512 msat available

Copied from https://github.com/fedimint/fedimint/issues/5000

douglaz commented 3 weeks ago

It seems that a transaction/reissue using 1024 msats became 512 msats. So we lost half of our sats. This really looks like a bug:

00:00:38     2024-04-17T19:25:15.352525Z  INFO fedimint_load_test_tool: Got 1024 msat more notes, reissuing them
00:00:38     2024-04-17T19:25:16.117666Z  INFO fedimint_load_test_tool: Note summary:
00:00:38     2024-04-17T19:25:16.120373Z  INFO fedimint_load_test_tool: 1 msat: 2
00:00:38     2024-04-17T19:25:16.120382Z  INFO fedimint_load_test_tool: 2 msat: 3
00:00:38     2024-04-17T19:25:16.120384Z  INFO fedimint_load_test_tool: 4 msat: 2
00:00:38     2024-04-17T19:25:16.120386Z  INFO fedimint_load_test_tool: 8 msat: 2
00:00:38     2024-04-17T19:25:16.120387Z  INFO fedimint_load_test_tool: 16 msat: 2
00:00:38     2024-04-17T19:25:16.120389Z  INFO fedimint_load_test_tool: 32 msat: 2
00:00:38     2024-04-17T19:25:16.120390Z  INFO fedimint_load_test_tool: 64 msat: 2
00:00:38     2024-04-17T19:25:16.120392Z  INFO fedimint_load_test_tool: 128 msat: 2
00:00:38     2024-04-17T19:25:16.120393Z  INFO fedimint_load_test_tool: 256 msat: 2
00:00:38     2024-04-17T19:25:16.120403Z  INFO fedimint_load_test_tool: Reminting 1 notes of denomination 1024 msat for 1 users, 1 notes per user (this may take a while if the number of users/notes is high)
00:00:38     2024-04-17T19:25:16.660180Z  INFO fedimint_load_test_tool: Note summary:
00:00:38     2024-04-17T19:25:16.660558Z  INFO fedimint_load_test_tool: 512 msat: 1

@elsirion any ideas here?

dpc commented 3 weeks ago

I added an assert in #5080 and I'm trying to repro in devimint-env with:

retry-success.sh ./target-nix/debug/fedimint-load-test-tool load-test --generate-invoice-with ln-cli --invoices-per-user=1 --notes-per-user=1 --note-denomination=1
024

I hope I'm not completely confused.

dpc commented 3 weeks ago

Oh. I think this is in internal reminting of per-user internal client. Let's check it ...

dpc commented 3 weeks ago

Theory. There's a race and when:

        mint_client
            .await_output_finalized(operation_id, out_point)
            .await?;

it doesn't mean that the notes are available in the database yet.

Edit: In the code it looks like notification is clearly sent after the state transition committed. Well, next thing to check is if something in parallel could be using the same internal client...

I also altered the command to:

retry-success.sh ./target-nix/debug/fedimint-load-test-tool --users 1 load-test --generate-invoice-with ln-cli --invoices-per-user=1 --notes-per-user=1 --note-deno
mination=1024

which seems to match what was happening in the failing case, but so far I was not able to trigger the issue, despite having an assertion there.

dpc commented 2 weeks ago

I'm quite sure this is the exact same underlying issue as in #5195 , which seems to be easier to repro.