cosmos / iavl

Merkleized IAVL+ Tree implementation in Go
Apache License 2.0
409 stars 256 forks source link

Root hash breaking change in v1.x.x #956

Closed drklee3 closed 1 week ago

drklee3 commented 1 month ago

Tested IAVL versions:

IAVL v1 has a change that produces a different root hash compared to v0.20.1

I'm opening this issue since I'm not certain if IAVL v1.0.0 is supposed to maintain identical root hashes with previous versions or if this change was intentional. If it is supposed to be non root hash breaking, then I can create a PR with a fix detailed below.

What's the issue?

InitialVersion in v0.20.1 has unintuitive behavior as tested in https://github.com/cosmos/iavl/pull/660

However, this was "fixed" in part of https://github.com/cosmos/iavl/pull/676 (with the unintuitive behavior test modified: https://github.com/cosmos/iavl/pull/676/files#diff-cfc68b2136fcea67d5d32dca8d15252b30c05cf5227fe44e70ca3ddd36c110a4L1447-L1470)

I'm not sure if this was intended or not as https://github.com/cosmos/iavl/pull/650 mentions violating the InitialVersion test is a problem.

Example Scenario

When a new IAVL tree is created with a non-zero InitialVersion and nodes are added before the first time SaveVersion() is called on the tree.

In relation to usage with Cosmos SDK, this happens when a new module store is added in an upgrade and the same store is written to in the same upgrade block. This upgrade handler run with IAVL v0.20.1 will produce a different apphash than using IAVL v1, specifically the root hash of the new store.

Example RegisterUpgradeHandlers() ```go func (app App) RegisterUpgradeHandlers() { app.upgradeKeeper.SetUpgradeHandler( "upgrade_name", func( ctx sdk.Context, plan upgradetypes.Plan, fromVM module.VersionMap, ) (module.VersionMap, error) { toVM, err := app.mm.RunMigrations(ctx, app.configurator, fromVM) if err != nil { return toVM, err } // This writes to the new store, new IAVL tree. // Corresponding IAVL nodes have version 1 on IAVL v0.20.1, but InitialVersion on v1.0.0 someParams := newmoduletypes.NewParams() app.newmoduleKeeper.SetParams(ctx, someParams) return toVM, nil }, ) ... if doUpgrade && !app.upgradeKeeper.IsSkipHeight(upgradeInfo.Height) { storeUpgrades := storetypes.StoreUpgrades{ Added: []string{ communitytypes.ModuleName, }, } // configure store loader that checks if version == upgradeHeight and applies store upgrades app.SetStoreLoader(upgradetypes.UpgradeStoreLoader(upgradeInfo.Height, &storeUpgrades)) } } ```

Resolution

We are currently using this patch on top of IAVL v1 that successfully runs through an upgrade scenario without an apphash error: https://github.com/Kava-Labs/iavl/commit/6db50239f44d4c6491362fe17375d91074497765

The corresponding upgrade block was originally created on KAVA mainnet with IAVL v0.20.1 and we ran into this issue when trying to sync a new node with both IAVL v1.0.0 and v1.2.0.

Our full upgrade handler can be found here for full context: https://github.com/Kava-Labs/kava/blob/v0.25.1/app/upgrades.go

This patch preserves the old behavior of v0.20.1 where the first time SaveVersion() is called on a new tree, individual nodes will use the default 0 + 1 version instead of InitialVersion to derive hashes.

Open to making a PR with the patch if this behavior is desired. 😃

mbreithecker commented 1 week ago

Hey, we recently migrated our devnet from cosmos v0.47 to cosmos v0.50 and are receiving super weird app-hash errors which might be related? Pruning does not seem to work correctly across the different versions. If we put pruning to "nothing" everything works fine, but as soon as the node starts to prune, we get "version does not exists" and then the store is corrupted.

drklee3 commented 1 week ago

Hey, we recently migrated our devnet from cosmos v0.47 to cosmos v0.50 and are receiving super weird app-hash errors which might be related? Pruning does not seem to work correctly across the different versions. If we put pruning to "nothing" everything works fine, but as soon as the node starts to prune, we get "version does not exists" and then the store is corrupted.

Hmm this issue is related to the internal node version being incorrect only for nodes created at the same time the tree is created, so I'm not sure if would be the same issue.

I would recommend using the iaviewer cmd to identify what part of the tree/node is causing the hash mismatch. We did need to modify it a bit to check the values of the internal Node fields.

cool-develope commented 1 week ago

@drklee3 thanks for reporting this edge case. This is not a bug, just fixing the bug of v0.20.x , ofc we intended v1.x compatible with v0.20.x I believe there are several ways to figure out this problem since it is only related to the upgrading + adding new module case

drklee3 commented 1 week ago

Thanks for clarifying! Just making sure it was intended since I wasn't entirely sure if it is supposed to be compatible or not.

Since we do prefer if validators could update from v0.20.x to v1.x on their own time (instead of all at once during an upgrade) with a state compatible version, so we'll likely use the "patch" until all validators are migrated to v1 before an upgrade with a hard v1.x requirement which then we can go back to this upstream v1.x