cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.26k stars 3.62k forks source link

Allow community maintenance of older SDK's #14426

Closed faddat closed 1 year ago

faddat commented 1 year ago

Problem

Symptoms

The problem makes it harder for older chains to adopt newer technologies. This can be seen with:

Solution

I'm proposing to bring these branches up to date, and provide some proof of efficacy, like a sync log or such. This will provide help to older chains when migrating because there will be fewer discrete upgrades that way. I'm making the issue, because I am hoping to get the work merged, so that chains with old histories (like the cosmos hub, osmosis, akash, sentinel, and others) can have easier management of state.

The solution should involve:

While working on some issues for Osmosis:

I was able to prove out that there's no problem with v0.42.x using iavl v0.19.4 and tendermint v0.34.24, but I did have some issues with passing all tests in the sdk. Eventually, this led back upstream, and I figured that the best possible course was to go through older SDK's and give them a bit of a cleanup.

julienrbrt commented 1 year ago

First just curious why cannot chains upgrade? And is there something we could do for that? Is it due to the in-place / genesis migration not working expectedly? Or due to the breaking changes between versions or something else (lack of docs,...)? We've seen some make a big jump (v44 -> v46).

You assume that upgrading older version will incentivize people to upgrade, why is that? Won't it make them stay forever in deprecated, and possibly vulnerable software because it seems maintained?

Personally, I like community forks but I think hosting it in the cosmos/cosmos-sdk repo set wrong expectations.

Expectation of maintenance by the SDK team and expectation of stability.

I feel like usually the way to go is to fork (libreoffice, nextcloud,...) and maybe instead have a community maintained repo of deprecated versions (only) of the SDK (e.g. cosmos/cosmos-sdk-deprecated-ce).

Users will need to add a replace directive, but it directly sets the expectations because the change is explicit and still lower the burden on the SDK team (because if PRs show up here, we will read them and test them anyway). We could always add a disclaimer about the a community-edition in the README of the unmaintained versions.

Again, just my two cents, I'm not the one deciding that anyway 😬

faddat commented 1 year ago

:)

Super happy to walk you through this sir :)

So, the way I ended up on this path, was working on getting iavl 0.19.4 in the oldest versions of Osmosis.

What I actually found is that on those older SDK's even having the tests (the ones that are there originally) pass reliably is a bit of a challenge.

Now, as for the holdup to upgrading, I can tell you in one word:

performance

Examples

In all cases, the ideal solution is to upgrade progressively, eg:

sequentially 1) add a fast node enabled iavl to the version of the SDK that the chain/community currently uses 2) convert the db from goleveldb to pebble

result

Result of fork-insistence

Suggestion, which respects limited time of SDK team

Instead of being fork-insistent, change the readme to explain the source of the code, and ensure users understand that everything past a certain commit has no backing from the SDK team.

tac0turtle commented 1 year ago

we do support older versions of the software for security releases, the idea has always been to only maintain 2 versions back in order to get people to upgrade. This way they get new features without passing more maintenance to others. Secondly, if a version is not EOL the sdk team is responsible for it, no matter which way we put it. If there is a new security vulnerability it becomes the core teams issue, this is why we recommend people upgrade sooner than later.

faddat commented 1 year ago

hmmmm I'm right with you on "upgrade sooner than later" which is why I want to grease the skids. I have another idea on this, will make an additional PR, but it won't touch code.

Instead it will link people to the skid-greasing release. I'm basically looking to make these tools available as-- like yourself, I think that teams should upgrade (much) sooner than later.

Then there's the archive node issue.

faddat commented 1 year ago

so, @tac0turtle -- consider the scenario where you, as a infranerd, wish to make an archive from scratch.

For any chain that has 42 in its history and used in-place upgrades, you need to kinda... go back in time performance-wise.

I've proven out that it is non-apphashy to upgrade to v0.19.4 of iavl on the 42 series.

Additionally:

So, while it may seem like a giant misallocation of time, I am certain that currently, from-scratch archives consume more time than making these changes.

If you check out the readme, you'll note explicit deprecation warnings, and also some usage guidance.

https://github.com/notional-labs/cosmos-sdk/tree/faddat/v0.42.x-modern

The issue with the Osmosis issues, for us, was finding where issues began and ended. In the end @catShaark and I were able to prepare branches that "worked fine" -- but did not pass tests.

So, this issue, and pull requests related to it, are in fact intended to make it easier for teams to adopt new sdk versions faster.

User stories

the point

Archive syncs are a billion times too hard, for example:

tac0turtle commented 1 year ago

so i would ask why are you syncing from scratch instead of using something like a version db that you can send the data to and not have to maintain large dbs. If you look at other ecosystems syncing from genesis also takes a while. In the sdk we have a strict policy of the latest 2 versions are maintained, this is something the golang language also follows. If we allow older versions to be maintained then teams are less inclined to upgrade cause they know it will be maintained. In the near future we should get rid of syncing from scratch the way you are doing it as its inefficient.

A new tool to do this can easily be made as an alternative to this. There are dbs on the network with the data, but they serve the data very slow, that is one issue, the second issue is execution, which we are trying to fix, but for older data there are simpler ways than syncing from scratch.

Im sorry, I will have to close this issue because backporting features to older releases is out of scope of the release and security process of this repo.