fedimint / fedimint

Federated E-Cash Mint
https://fedimint.org/
MIT License
536 stars 210 forks source link

Consensus and Api versioning - APIs #1579

Closed dpc closed 1 year ago

dpc commented 1 year ago

Currently rebased on top of #1575 to avoid merge conflict. See only the last commit.

Implement the basic types and module-related primitves to support versioning.

These are currently not used for anything, and here only for moving forward with the discussion from https://github.com/fedimint/fedimint/issues/1548.

dpc commented 1 year ago

@elsirion

I haven't really thought through all the details, and I hope exactly that looking at the code will make the vision and discussion more precise.

I agree that in many cases importing multiple versions of module codebase is way better idea than cramming it together glued with bunch of if statements. But then - won't majority of cases be trivial? E.g. no changes?

One thing that is not clarified (in my head and in the code) is if module will have its own counter for consensus versions and then there be a global one for core Fedimint protocol. I guess it has to be both, where core protocol consensus works like like a major version for each protocol instance.

In that case it should be

fn version(core: CoreConsensusVersions) -> [ModuleConsensusVersion]

or

fn version(core: CoreConsensusVersions) -> ModuleConsensusVersion

if we want to enforce that any-time module does any change in its own consensus, it needs to make a new codebase.

?

elsirion commented 1 year ago

I don't think the module consensus version has to be a function of the FM consensus version. Either the module needs to be adapted to work with a newer FM version (likely) then it can change its own version or not, then there is no sense in changing the module version.

EDIT: but to your general point: yes, Fedimint needs its own set of consensus, API, DB versions separate from any modules.

dpc commented 1 year ago

Either the module needs to be adapted to work with a newer FM version (likely) then it can change its own version or not, then there is no sense in changing the module version.

That means it is a function of fm consensus version, no?

We at least should aim at not letting users make accidental mistakes.

So

fn version(core: CoreConsensusVersions) -> [ModuleConsensusVersion]

If FM changes consensus from 2 to 3, and module doesn't care then:

module_init.versions(2)== module_init.module_version(3).first()

If it does, then it will have it's own version change and

module_init.versions(2).last() + 1 == module_init.module_version(3).first()

By having this function we can check for the user that all the modules are compatible with FM consensus changes. Initially just bunch of checks and crashing with an error message explaining it, but in the future we might even support automated scheduled consensus changes where guardians use UI to agree on the consensus change that gets scheduled and "switched into" automatically at the designed epoch.

elsirion commented 1 year ago

Maybe I'm wrong here, but in my mind a new consensus version of Fedimint is a new major code version, so the old module wouldn't compile with the new Fedimint anyway. So the

module_init.versions(2).last() + 1 == module_init.module_version(3).first()

case doesn't really exist.

I also don't get what first/last of the returned slice would be. What's the semantic of the returned array? Supported versions? But how would a version be even selected when e.g. processing an output? We don't have version fields in all our consensus relevant structs rn, are we planning on changing that?

So far I assumed we just treat a new consensus version of a module as a completely new module with its own instance id and a related kind and let clients figure out migration. I would like to have a version field in the module kind type for that.

dpc commented 1 year ago

But how would a version be even selected when e.g. processing an output?

Versions are on ModuleGen so what it means is: this module codebase can instantiate module instance for the following module consensus versions. This could be used e.g. in the Admin UI to tell user that their codbase is/isn't compatible with module consensus upgrade guardians are trying to schedule, and prompt them to upgrade their binary.

In the simplest case, a module returns just one version meaning - this code base supports only this module consensus version.

Maybe I'm wrong here, but in my mind a new consensus version of Fedimint is a new major code version, so the old module wouldn't compile with the new Fedimint anyway.

I don't think this is necessary. Initially we probably will tie them together for simplicity, but our module code API surface layer is somewhat narrow, and nicely typed.

We might at some point change some consensus rules which most (all) of the module won't care about, and they won't have to change a single line of code, other than adding the new consensus to their list of supported FM consensus version, purely as a safeguard.

On the other hand, we might want to change some module trait APIs, without actually changing consensus rules.

And sometimes we might do both at the same time?

So far I assumed we just treat a new consensus version of a module as a completely new module with its own instance id and a related kind and let clients figure out migration. I would like to have a version field in the module kind type for that.

I guess there's an overlap here. You might be right. I need to think about about it a bit longer and run thought more scenarios for theoretical modules, and how practical it is. For a module like mint such a static migration seems very nice, but I wonder if it will always be sufficient.

jkitman commented 1 year ago

Currently we have a CODE_VERSION which is stored in the ServerConsensusConfig so that if they don't match, servers will fail to join consensus. How would these new versions interact with the existing one?

dpc commented 1 year ago

Currently we have a CODE_VERSION which is stored in the ServerConsensusConfig so that if they don't match, servers will fail to join consensus. How would these new versions interact with the existing one?

Replace. We should compare consensus versions (global FM one, and each module's instance). If the code says it's compatible, we believe it.

dpc commented 1 year ago

@elsirion

I guess there's an overlap here. You might be right. I need to think about about it a bit longer and run thought more scenarios for theoretical modules, and how practical it is. For a module like mint such a static migration seems very nice, but I wonder if it will always be sufficient.

OK, so I think it's not hard to come up with examples where rolling over to new instance is not really desireable / possible.

Let's say we have a smart contract module, Ethereum style. Let's ignore for a second if that's a bad idea to do it. Since the contracts in the existing instance interact with other contracts, this creates a strong network effect.

Migrating to another instance with new rules in slow motion is close to impossible. The module becomes it's own side-chain, not unlike any alt-coin, and a different instance of it would be just a minority contender. What the module authors want is a scheduled hard-fork to change the rules (e.g fee structure), and not a slow migration.

Similarly if the module is storing a lot of long term backup (like a threshold encrypted-backup), a module consensus hard-fork is preferred over bothering users with migration.

So, yeah - I don't think rolling over is going to work in every case. Thought I do agree that when possible it's very elegant. The module consensus upgrade is orthogonal to it. A mint-v2 KIND becomes a completely different module kind than mint-v1 (if we do it through just different KIND string or an extra "module kind version" is implementation detail). It will be up to module developers to decide if rolling over to a new kind&instance is preferable over "module consensus hard-fork", and we will guide the user through this difference in the UIs.

jkitman commented 1 year ago

Currently we have a CODE_VERSION which is stored in the ServerConsensusConfig so that if they don't match, servers will fail to join consensus. How would these new versions interact with the existing one?

Replace. We should compare consensus versions (global FM one, and each module's instance). If the code says it's compatible, we believe it.

So we could run with different binary versions potentially? Seems like that could be error-prone.

If this is replacing CODE_VERSION can we also remove it and put these versions into the consensus config then?

dpc commented 1 year ago

So we could run with different binary versions potentially? Seems like that could be error-prone.

Eventually we have to discard the training wheels and graduate to the big league of distributed software.

Right now the stakes are low, so it is a good opportunity to learn and gain some experience.

Whenever we touch anything that smells like breaking consensus, we just increment the corresponding number and it comes to the same thing.

If this is replacing CODE_VERSION can we also remove it and put these versions into the consensus config then?

I am not doing it in this PR, but yes, that would exactly be the next step. We start comparing consensus versions between peers, and module code compatibility against the federation consensus and we error out if anything is wrong.

elsirion commented 1 year ago

Sounds reasonable, will look at it again with fresh eyes tomorrow, not too opposed to land in this form anymore. I still think one code version only supporting one consensus version would be more elegant, but this flexibility is likely enough to be useful in the not too distant future to make sense.

justinmoon commented 1 year ago

Dev call: add test to CI to assert that version compatibility didn't change if not declared explicitely

dpc commented 1 year ago

I amended with changes corresponding to some conclusions from the discussion here.

elsirion commented 1 year ago

Let's try this! Next steps are inclusion in cfg/client side checks?

dpc commented 1 year ago

Let's try this! Next steps are inclusion in cfg/client side checks?

I think so, yes. And we have to remember to try to increment these numbers when we break api/consensus rules. Not a big deal if we forget, but we should try.