Closed dpc closed 3 months ago
I think it increases flakiness of one/two tests.
Edit: sends_ecash_out_of_band_cancel
Attention: 36 lines
in your changes are missing coverage. Please review.
Comparison is base (
52f0c96
) 58.06% compared to head (fc0e4d7
) 58.17%. Report is 62 commits behind head on master.:exclamation: Current head fc0e4d7 differs from pull request most recent head 7d777f8. Consider uploading reports for the commit 7d777f8 to get more accurate results
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@elsirion fixed, rebased, squashed
@elsirion I'm afraid until https://github.com/fedimint/fedimint/pull/4250 is figured out, this is making the CI too flaky to land.
Make client side state machines go brrrr:
no busy looping/polling anything (good for phone bateries)
parallel state transitions
The parallel state transitions may lead to duplicated processing. So it may or may not improve phone batteries.
Perhaps we should keep them serial for now? I'm not sure we understand all the possible side effects and new issues that may happen.
The parallel state transitions may lead to duplicated processing. So it may or may not improve phone batteries.
Not polling on empty executor is the main benefit. The second one is not canceling all pending futures. They key to saving battery is not waking up CPUs unnecessarily, so they can sit in low power mode for as long as possible.
Perhaps we should keep them serial for now? I'm not sure we understand all the possible side effects and new issues that may happen.
Processing state change in parallel or not is a minor thing, IMO, one or the other. My bet is conflicts here will not be happening a lot, and even if they do, coupe of retries with a backoff and everything should complete anyway in a burst of activity, after which the CPUs can go back to sleep.
I don't want to pesimise right away. If it looks like db conflicts are common, I'll start by introducing a map of operation_id -> mutex
or module_id -> mutex
to serialize things touching same parts.
And I think And some conflicts from time to time will be beneficial as they will exercise idempotency of state transition functions.
@dpc the question is when we are including this in a stable release. If we want to include it in the foreseen future, we should be more conservative, incremental and pessimistic about the changes.
@dpc the question is when we are including this in a stable release. If we want to include it in the foreseen future, we should be more conservative, incremental and pessimistic about the changes.
Not planing to backport it. If we do backport it, I guess I can wrap in a mutex there.
oh formatting is totally messed up, rustfmt is giving up
oh formatting is totally messed up, rustfmt is giving up
The arm of that match
for ExecutorLoopEvent::Triggered
is the only part affected, it seems. Beats me why, and it's hard to tweak it too much without breaking it.
I don't think I care too much, unless you see some particular eyesores.
BTW. We do have a big red warn!
on autocommit fail, so we'll see quickly if parallel state machine transfer functions are conflicting.
Amended with Manmeet's fix. I still don't understand what he did there. :D
I created #4291 to make sure this doesn't break wasm
Now that flakiness is fixed and random failures dissabled, I rebased and this should be good to go.
Why backwards-compatibility tests are failing?
Looks like a few of the rust unit tests are timing out (flake?).
TIMEOUT [ 480.039s] fedimint-ln-gateway::gatewayd-integration-tests
TIMEOUT [ 480.092s] fedimint-wallet-tests::fedimint_wallet_tests peg_out_fail_refund
I think we can disable rust_unit_tests
in backwards-compatibility tests since it's not using binaries from previous versions.
If we're okay with disabling: https://github.com/fedimint/fedimint/pull/4323
Something smell wrong about this change. Will be debugging.
Hopefully #4329 (include here already) will finally make it solid green.
Seems like this change improves latency tests 2 times.
before 1st try:
================= RESULTS ==================
REISSUE: min: 1.5s, avg: 1.6s, median: 1.6s, p90: 1.6s, max: 1.6s, sum: 47.0s
LN SEND: min: 2.9s, avg: 3.0s, median: 2.9s, p90: 3.0s, max: 3.8s, sum: 59.0s
LN RECV: min: 1.4s, avg: 1.8s, median: 1.5s, p90: 2.4s, max: 2.4s, sum: 35.5s
FM PAY: min: 3.5s, avg: 3.6s, median: 3.6s, p90: 3.6s, max: 3.6s, sum: 71.3s
before, 2nd try:
================= RESULTS ==================
REISSUE: min: 1.5s, avg: 1.6s, median: 1.6s, p90: 1.6s, max: 1.6s, sum: 47.0s
LN SEND: min: 2.8s, avg: 2.9s, median: 2.9s, p90: 2.9s, max: 3.7s, sum: 58.7s
LN RECV: min: 1.4s, avg: 1.8s, median: 1.5s, p90: 2.4s, max: 2.4s, sum: 35.0s
FM PAY: min: 3.5s, avg: 3.8s, median: 3.6s, p90: 3.6s, max: 8.6s, sum: 76.3s
after, 1st try:
================= RESULTS ==================
REISSUE: min: 0.5s, avg: 0.6s, median: 0.6s, p90: 0.6s, max: 0.6s, sum: 16.6s
LN SEND: min: 1.3s, avg: 1.5s, median: 1.5s, p90: 1.7s, max: 2.0s, sum: 30.0s
LN RECV: min: 1.3s, avg: 1.6s, median: 1.6s, p90: 1.8s, max: 1.9s, sum: 32.4s
FM PAY: min: 1.5s, avg: 1.6s, median: 1.6s, p90: 1.6s, max: 1.8s, sum: 32.1s
after, 2nd try:
================= RESULTS ==================
REISSUE: min: 0.5s, avg: 0.6s, median: 0.5s, p90: 0.6s, max: 0.6s, sum: 16.6s
LN SEND: min: 1.3s, avg: 1.5s, median: 1.4s, p90: 1.7s, max: 2.1s, sum: 29.9s
LN RECV: min: 1.3s, avg: 1.6s, median: 1.6s, p90: 1.6s, max: 1.6s, sum: 31.9s
FM PAY: min: 1.5s, avg: 1.6s, median: 1.6s, p90: 1.6s, max: 1.7s, sum: 32.2s
On top of #4227
Make client side state machines go brrrr:
parallel state transitions
No state machine was harmed during the rewrite.