fedimint / fedimint

Federated E-Cash Mint
https://fedimint.org/
MIT License
536 stars 209 forks source link

refactor: improve client executor loop #4230

Closed dpc closed 3 months ago

dpc commented 3 months ago

On top of #4227

Make client side state machines go brrrr:

dpc commented 3 months ago

I think it increases flakiness of one/two tests.

Edit: sends_ecash_out_of_band_cancel

codecov[bot] commented 3 months ago

Codecov Report

Attention: 36 lines in your changes are missing coverage. Please review.

Comparison is base (52f0c96) 58.06% compared to head (fc0e4d7) 58.17%. Report is 62 commits behind head on master.

:exclamation: Current head fc0e4d7 differs from pull request most recent head 7d777f8. Consider uploading reports for the commit 7d777f8 to get more accurate results

Files Patch % Lines
fedimint-client/src/sm/executor.rs 88.47% 28 Missing :warning:
fedimint-client/src/lib.rs 81.81% 2 Missing :warning:
modules/fedimint-mint-client/src/input.rs 75.00% 2 Missing :warning:
modules/fedimint-ln-client/src/receive.rs 83.33% 1 Missing :warning:
modules/fedimint-ln-common/src/lib.rs 0.00% 1 Missing :warning:
modules/fedimint-mint-client/src/output.rs 91.66% 1 Missing :warning:
modules/fedimint-wallet-client/src/deposit.rs 83.33% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #4230 +/- ## ========================================== + Coverage 58.06% 58.17% +0.11% ========================================== Files 197 197 Lines 43755 43684 -71 ========================================== + Hits 25405 25415 +10 + Misses 18350 18269 -81 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

dpc commented 3 months ago

@elsirion fixed, rebased, squashed

dpc commented 3 months ago

@elsirion I'm afraid until https://github.com/fedimint/fedimint/pull/4250 is figured out, this is making the CI too flaky to land.

douglaz commented 3 months ago

Make client side state machines go brrrr:

  • no busy looping/polling anything (good for phone bateries)

  • parallel state transitions

The parallel state transitions may lead to duplicated processing. So it may or may not improve phone batteries.

Perhaps we should keep them serial for now? I'm not sure we understand all the possible side effects and new issues that may happen.

dpc commented 3 months ago

The parallel state transitions may lead to duplicated processing. So it may or may not improve phone batteries.

Not polling on empty executor is the main benefit. The second one is not canceling all pending futures. They key to saving battery is not waking up CPUs unnecessarily, so they can sit in low power mode for as long as possible.

Perhaps we should keep them serial for now? I'm not sure we understand all the possible side effects and new issues that may happen.

Processing state change in parallel or not is a minor thing, IMO, one or the other. My bet is conflicts here will not be happening a lot, and even if they do, coupe of retries with a backoff and everything should complete anyway in a burst of activity, after which the CPUs can go back to sleep.

I don't want to pesimise right away. If it looks like db conflicts are common, I'll start by introducing a map of operation_id -> mutex or module_id -> mutex to serialize things touching same parts.

And I think And some conflicts from time to time will be beneficial as they will exercise idempotency of state transition functions.

douglaz commented 3 months ago

@dpc the question is when we are including this in a stable release. If we want to include it in the foreseen future, we should be more conservative, incremental and pessimistic about the changes.

dpc commented 3 months ago

@dpc the question is when we are including this in a stable release. If we want to include it in the foreseen future, we should be more conservative, incremental and pessimistic about the changes.

Not planing to backport it. If we do backport it, I guess I can wrap in a mutex there.

maan2003 commented 3 months ago

oh formatting is totally messed up, rustfmt is giving up

dpc commented 3 months ago

oh formatting is totally messed up, rustfmt is giving up

The arm of that match for ExecutorLoopEvent::Triggered is the only part affected, it seems. Beats me why, and it's hard to tweak it too much without breaking it.

I don't think I care too much, unless you see some particular eyesores.

dpc commented 3 months ago

BTW. We do have a big red warn! on autocommit fail, so we'll see quickly if parallel state machine transfer functions are conflicting.

dpc commented 3 months ago

Amended with Manmeet's fix. I still don't understand what he did there. :D

maan2003 commented 3 months ago

I created #4291 to make sure this doesn't break wasm

dpc commented 3 months ago

Now that flakiness is fixed and random failures dissabled, I rebased and this should be good to go.

douglaz commented 3 months ago

Why backwards-compatibility tests are failing?

bradleystachurski commented 3 months ago

Looks like a few of the rust unit tests are timing out (flake?).

TIMEOUT [ 480.039s] fedimint-ln-gateway::gatewayd-integration-tests
TIMEOUT [ 480.092s] fedimint-wallet-tests::fedimint_wallet_tests peg_out_fail_refund

I think we can disable rust_unit_tests in backwards-compatibility tests since it's not using binaries from previous versions.

If we're okay with disabling: https://github.com/fedimint/fedimint/pull/4323

dpc commented 3 months ago

Something smell wrong about this change. Will be debugging.

dpc commented 3 months ago

Hopefully #4329 (include here already) will finally make it solid green.

dpc commented 3 months ago

Seems like this change improves latency tests 2 times.

before 1st try:
================= RESULTS ==================
REISSUE: min: 1.5s, avg: 1.6s, median: 1.6s, p90: 1.6s, max: 1.6s, sum: 47.0s
LN SEND: min: 2.9s, avg: 3.0s, median: 2.9s, p90: 3.0s, max: 3.8s, sum: 59.0s
LN RECV: min: 1.4s, avg: 1.8s, median: 1.5s, p90: 2.4s, max: 2.4s, sum: 35.5s
FM PAY: min: 3.5s, avg: 3.6s, median: 3.6s, p90: 3.6s, max: 3.6s, sum: 71.3s

before, 2nd try:
================= RESULTS ==================
REISSUE: min: 1.5s, avg: 1.6s, median: 1.6s, p90: 1.6s, max: 1.6s, sum: 47.0s
LN SEND: min: 2.8s, avg: 2.9s, median: 2.9s, p90: 2.9s, max: 3.7s, sum: 58.7s
LN RECV: min: 1.4s, avg: 1.8s, median: 1.5s, p90: 2.4s, max: 2.4s, sum: 35.0s
FM PAY: min: 3.5s, avg: 3.8s, median: 3.6s, p90: 3.6s, max: 8.6s, sum: 76.3s

after, 1st try:
================= RESULTS ==================
REISSUE: min: 0.5s, avg: 0.6s, median: 0.6s, p90: 0.6s, max: 0.6s, sum: 16.6s
LN SEND: min: 1.3s, avg: 1.5s, median: 1.5s, p90: 1.7s, max: 2.0s, sum: 30.0s
LN RECV: min: 1.3s, avg: 1.6s, median: 1.6s, p90: 1.8s, max: 1.9s, sum: 32.4s
FM PAY: min: 1.5s, avg: 1.6s, median: 1.6s, p90: 1.6s, max: 1.8s, sum: 32.1s

after, 2nd try:
================= RESULTS ==================
REISSUE: min: 0.5s, avg: 0.6s, median: 0.5s, p90: 0.6s, max: 0.6s, sum: 16.6s
LN SEND: min: 1.3s, avg: 1.5s, median: 1.4s, p90: 1.7s, max: 2.1s, sum: 29.9s
LN RECV: min: 1.3s, avg: 1.6s, median: 1.6s, p90: 1.6s, max: 1.6s, sum: 31.9s
FM PAY: min: 1.5s, avg: 1.6s, median: 1.6s, p90: 1.6s, max: 1.7s, sum: 32.2s