Closed okwme closed 1 year ago
just discovered this PR https://github.com/cosmos/cosmos-sdk/pull/12028 that has migration logic for staking module. Would protocol normally be to add something like this for bank module when KV store was changed?
The migration for key format change is included in the SDK, here: https://github.com/cosmos/cosmos-sdk/blob/v0.46.0/x/bank/migrations/v046/store.go#L18
I think you need to call your keepers.BankKeeper.GetDenomMetaData
function after the bank migration above has run. My hypothesis is that you're calling keepers.BankKeeper.GetDenomMetaData
before the store migration, so it uses new code (new key format) to query old db (old key format). I would suggest to run all store migrations first, then update the metadata.
@okwme if you have a gaia branch with this error, I can try to help there
Thank you @amaurym ! you're right that there is in fact a migration for bank that I missed somehow. You are also correct that we were attempting to call GetDenomMetaData
before the migration had taken place (although just the wrong order within the handler).
However after digging a bit deeper I believe that the bug DOES in fact exist despite the fact that it shouldn't have been visible from the current configuration (WHAT ARE THE CHANCES ACTUALLY??). After re-ordering the upgrade handler so that the migration occurred before we attempted to retrieve from the newly migrated store, it still fails with the same error! When iterating over the keys and value that are in the store, instead of uatomuatom
as before it just shows uatomu
which demonstrates it is the wrong length. The addition of one is from the incorrect expectation that the store prefix (0x1
) should be accounted for.
I opened a PR with a fix: https://github.com/cosmos/cosmos-sdk/pull/13813
It's very unlikely that there would be 2 bugs that resulted in the same error happening simultaneously. But does that seem like the correct assessment to you?
I'm working off of this branch https://github.com/cosmos/gaia/tree/okwme/test-upgrade-debug. To see the bug in action do the following steps:
make build
so it is built insde build
as ./build/gaiad
./run-gaiad-v7.sh
./run-upgrade-commands.sh 30 200
./run-gaiad-v8.sh
to start the new binary and run the migration / upgrade and see the error from the sdk's bank migration
Summary of Bug
We ran into a bug in gaia's upgrade handler related to the bank module and coin metadata. Basically we were getting validate genesis errors because the coin metadata was missing
Name
andSymbol
from theMetadata
. We decided to add them to the bank keeper like this:however we kept getting the error
atom denom not found
.The
GetDenomMetaData
should take theBase
denom string (uatom
) in this case and retrieve it using the bank keeper as follows:As you can see here the
store
is created usingtypes.DenomMetadataPrefix
and thendenom
is used as the key to retrieve the metadata. HOWEVER, this is not how the bank keeper originally stored this metadata in v0.45 of the Cosmos SDK. If we jump intoSetDenomMetadata
of the bank keeper of Cosmos SDK v0.45 we will see a different schema for storing the Metadata:Here we can see that the
store
is created using a different version of thetypes.DenomMetadataKey
, one that takesBase
as a parameter and subsequently appends with theBase
string:As opposed to Cosmos SDK v0.46.x where there is no use of
DenomMetadataKey
.TLDR:
Bank module changed the format for the key when storing coin metadata, so without an explicit migration between versions the application was unable to properly retrieve metadata with the old key after an upgrade from v0.45.x to v0.46.x.
Open Question:
We can modify and replace the store used in Gaia v8 with Cosmos SDK v0.46.x so that this is not an issue. But is this the correct way to address the problem? Do modules have their own KV migrations for situations like this or is it the responsibility of the app developer to notice changes like this and ensure upgrade logic is written in app.go to ensure continuity? Are there other apps this could impact or other ways to ensure that other modules haven't befallen a similar fate?
There was another recent issue in gaia that was moved to Cosmos SDK and then moved back to gaia because it was pointed out that it is the responsibility of the application to ensure data integrity. Is this an other instance of such an event and if so how do we make sure that application developers are aware of this kind of breaking change?
I see that the item was included under State Machine Breaking as follows:
Version
Steps to Reproduce