Closed elsirion closed 3 months ago
subscribe_progress
sounds very useful.What happens if client is restarted during recovery
Good that you bring that up, maybe the default build method would return some enum:
enum ClientOrRecovering {
Initialized(Client),
Recovering(RecoveringClient),
}
while the recovering build fn always returns a RecoveringClient
?
EDIT: and yes, you'd resume from the last checkpoint.
Dev call: this description is a little outdated
New design:
ClientContext
)Some more ideas:
IState
/State
) that says if the execution log should be persisted. That way we don't spam the client DB anymore.ClientModuleInit
trait that return client module and recovery SM to be spawned (idk if there's a more elegant way now)(Old notes: https://github.com/fedimint/fedimint/pull/3008#issuecomment-1791188772)
- Separate init fn on
ClientModuleInit
trait that return client module and recovery SM to be spawned (idk if there's a more elegant way now)
We can just pass snapshot: Option<&[u8]>
to init
I guess...
I'm tempted to have async fn recover(&self, args: &ClientModuleInitArgs<Self>) -> anyhow::Result<Self::Recovery>
and a whole new trait trait ClientModuleRecovery
with something like;
trait ClientModuleRecovery {
async fn make_progress(self) -> Option<Self>;
fn progress(&self) -> f32;
}
and track them separately. Otherwise it will just pollute the normal client with an extra global mode everywhere.
@elsirion
We can just pass
snapshot: Option<&[u8]>
toinit
I guess...
How would we distinguish "recover without backup" and "no recovery"?
By having a make_progress
fn, do you want to implement an entirely new mechanism of driving recovery forward? As I understood it we'd let the SM executor do most of the heavy lifting and just add methods to track progress.
By having a
make_progress
fn, do you want to implement an entirely new mechanism of driving recovery forward? As I understood it we'd let the SM executor do most of the heavy lifting and just add methods to track progress.
I'm thinking: Recoveries don't have conditionals and diverging states, so they don't need fancy state machines. We keep pulling make_progress
and savekeep the progress in the db until None
is returned and recovery is complete. After that the module can be ClientModuleInit::init
ed.
But we don't want a separate high-level struct, i.e. RecoveringClient
, right (that's what I first intended, but you had good arguments against that)? So Client
would have a map of initialized modules and another with recovering ones?
So
Client
would have a map of initialized modules and another with recovering ones?
Yeah.
Recovering client module is really not client module. It can't do anything, so all methods would require a conditional returning "I'm not ready" etc. @elsirion
Pasting from a post I made earlier:
I think I'm going to have have the modules use their own database to preserve the recovery state as they please. They can use own one key to store the state and encode/decode it (like in the existing mint recovery code does) if they want, but also store it in multiple keys or do whatever else they want. Passing encodable state back and forth is a drag introducing extra type-system conversions etc and overhead of passing things back and forth. The original mint client recovery was written in the current style because it had to fit the state machines, but this is no longer a constrain. In certain situations it might not even be desirable/possible to fit all the recovery state in the the memory and pass it around.
Then I realized that having
make_progress
is also kind of limiting and pointless, and I'll just usetokio::sync::watch
or something so that recovering module can send updates whenever they feel like it, and don't have to interrupt execution to return "progress". This makes things like pre-fetching, buffering, using streams much easier as the recovery code is just unconstrained future running to completion.So all in all instead of
ClientRecovery::make_progress, the global recovery code will be calling
ClientModuleInit::recoverwith some
RecoveryArgson each start until that future completes without error (completed), which will be recorded in persisted store so on next start
ClientModulInit::init` can be called instead.
and then @elsirion voicing concerns:
I feel like FM should at least provide a common way/framework to build resilient backup recovery. Maybe we give full control to the module authors to do it themselves, but supply a generic function that lets modules define a recovery state machine and runs it, persisting and announcing progress regularly.
to which I would say that structurally module implementations messing their implementations is not a problem (doesn't block or break anything else in their Client
), but indeed I'd love to have helpers for that.
Currently I'm planning to give the fn ModuleInit::recovery
some RecoveryArgs
very much like InitArgs
and other than the normal accessors, it can have one or more "given this single step logic, returning this Encodable
state, make the recovery happen correctly"
🎉
Edit: TODO checklist:
add_state_machines_inactive_dbtx
Previous discussion
In the design described below a badly implemented module could stall the recovery of all other modules and we decided against it. I leave it here for historic context. For the actual proposal see https://github.com/fedimint/fedimint/issues/2977#issuecomment-1819626967
OUTDATED:
The current recovery interface was built in a hurry to bring the new client library to feature-parity with the old one so that a switch could happen. Unfortunately this lead to a design with multiple shortcomings mostly stemming from the fact that recovery is modeled as a module state machine and the different modules cannot easily interact.
Client
object already exists, but it is a programmer error to call any module methods that could trigger DB writes and thus interfere with recovery. Instead a recovering client should be a separate type.Proposed Architecture
I already talked to @dpc about the following new architecture:
build_restoring_from_backup
returns aRecoveringClient
structRecoveringClient::subscribe_progress() -> watch::Receiver<Progress>
: A way to track recovery progressimpl Future<Output=Result<Client>>
: theRecoveringClient
is a future that can be awaited and returns the client on successful completion of recoverywatch::Sender
.RecoveringClient
is dropped before completion it will continue from there on next try.This design solves the problems mentioned above:
RecoveringClient
future resovles no client functions are available.