fedimint / fedimint

Federated E-Cash Mint
https://fedimint.org/
MIT License
583 stars 225 forks source link

Unable to restore without backup #4143

Open TonyGiorgio opened 10 months ago

TonyGiorgio commented 10 months ago

I'm working on integrating restore without an existing backup. I'm curious about how it functions because I seem to be getting some OOM or weird vec issues, at least from the browser.

It seems like it's trying to commit all of the scanned data to the client storage? Like it's hammering our storage and running out of space to save or something? I see version update numbers are incrementing by a few during this process until it eventually crashes and stops receiving messages from the mint.

Also whenever I refresh the client, it seems to start to try to rescan over and over again? Effectively never finishing. I'm curious if you have rescanning working in webimint. Depending on it's speed, bandwidth, and memory requirements, I may have to exclude this functionality from our wallet.

I'm testing this on v0.2.2-rc7

Screenshot 2024-01-26 at 4 56 16 PM
dpc commented 10 months ago

Please send whole stacktrace.

TonyGiorgio commented 10 months ago

I can't provide one, it's all wasm gibberish.

dpc commented 10 months ago

Can you try to restore same seed using fedimint-cli of the same version?

elsirion commented 10 months ago

If you are on 0.2.x and not master I have a suspicion: there we still use the state machine executor to run recovery (@dpc fixed that in #4035 and follow-ups).

To be able to continue recovery if the application is shut down in the middle of it we introduced a state transition every few downloaded sessions, this persists the progress to the DB. But the SM executor keeps a log of all former, now inactive states, so that might bloat your storage to a point that it crashes.

For payment state machines keeping that log is good in case there was a bug and we want to roll out a fix, then still having all the intermediate states is good. Eventually we'll need some way to clean up very old logs though too.

TonyGiorgio commented 10 months ago

Can you try to restore same seed using fedimint-cli of the same version?

Good idea, I'll try that soon. Right now we derive a child seed so I need to add a feature that allows exposing that child seed for interoperability with fedimint-cli (we picked the same derivation as fedimint after we derived the child seed).

TonyGiorgio commented 10 months ago

If you are on 0.2.x and not master

For the client side, is it recommended to be at the same v0.2.x version, or is master the preferred approach?

dpc commented 10 months ago

For the client side, is it recommended to be at the same v0.2.x version, or is master the preferred approach?

This is a fundamental pain point to overcome in the end application. With time backward-compatibility of Fedimint should improve, but the backward incompatible changes were so far frequent and probably won't entirely cease to exist (but will happen only between semver-breaking versions). So an application needs to either accept being able to only use certain subset of real world federations, or e.g. compile in multiple version of Fedimint and auto-detect which one to use, etc.

dpc commented 10 months ago

If you're not bound to existing backward-compatibilty, I'd aim at supporting the newest version you can. But that only delays the problem until Fedimint needs an consensus-incompatible version.

IIUC 0.1 -> 0.2 was a breaking change, but hopefully 0.2 -> 0.3 will stay compatible.

The state of master is decent, but accidental incompatibility and bugs do happen. And Rust API state can still break frequently (e.g. recovery re-architecture, will broke bunch of code APIs and some architectural details).

TonyGiorgio commented 10 months ago

We haven't officially started supporting it, so I'm happy to deal with breaking code changes and possible client side incompatibility changes too. Just curious if you think I should go ahead and start running master client side on v0.2 fedimint server.

Or if it's like: go ahead with master for upcoming v0.3 fedimint server, but otherwise not 100% there yet, that's fine too. I think it'll be a few months until we start to "officially" support it. But tbd either way. If master fixes some of these problems then I'm happy to switch.

dpc commented 10 months ago

I'd like to think that with every single commit things are getting better, so the more recent version you can use the better for you. And all the breaking changes you'll generally have to go through sooner or later anyway.

elsirion commented 10 months ago

I really need to write that compatibility guarantee document I see :laughing:

@TonyGiorgio any non-released version (master, release branch rcs) may contain breaking changes that we just haven't caught since the test suite for that is just being built rn. So I'd caution against using that in production. Once there is a new release the client-server API should stay forwards+backwards compatible, so it should be safe to upgrade. Major upgrades (0.2->0.3) are likely to break rust APIs though.

justinmoon commented 10 months ago

Dev call:

TonyGiorgio commented 10 months ago

Does it work happen 0.2.1?

It still happens to me on 0.2.1. This log had appeared shortly after initiating the scan:

6717 2024-01-29 20:17:02.570 ERROR [mutiny_wasm::indexed_db:559] Failed to save ([("fedimints/c8d423964c7ad944d30f57359b6e5b260e211dcfdb945140e28d4df51fd572d2", Object {"version": Number(96), "value": String("3400026...
(long text was truncated: 21MB)

Is it saving each fedimint state update that's unrelated to my wallet whenever it does a scan? I had only done a single transaction. I don't seem to be able to get through them all, even making some optimizations to the storage writer we use.

Does it work happen 0.2.2-rc7 on fedimint-cli?

Doesn't build for me in nix so I cannot test it.

dpc commented 10 months ago

Yeah, the reason for this is our extremely wasteful previous restore implementation multiplied by wasteful wasm memdb handling.

elsirion commented 10 months ago

I think at some point we saw a few 100M log file just from logging the state transitions from recovery (and we reduced that a bit iirc), but in general expect recovery to be really wasteful with DB resources in 0.2, hence #4035.