cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.24k stars 3.6k forks source link

EPIC: make guarantees around query responses #13041

Closed tac0turtle closed 1 year ago

tac0turtle commented 2 years ago

Summary

As mentioned in the issue from cosmwasm there is a need to have data response guarantees around queries and messages. This issue focuses on the former, queries.

Problem Definition

Within minor releases responses of queries could change, potentially causing issues to downstream users.

Proposal

This could be done in coordination with auto generated queries

aaronc commented 2 years ago

I believe there should be some annotation in the .proto files to indicate which queries are deterministic and can be used inside the state machine.

There may always be some cases where people want to use queries in some non-deterministic or very inefficient way. Those queries shouldn't be exposed to the state machine and modules like cosmwasm.

Probably by default we should assume all queries are unsafe, and one by one start marking the safe ones (which have proper regression tests, gas consumption, etc.) with an annotation.

amaury1093 commented 2 years ago

It's also probably safer to disable all historical queries, since they depend on each node's pruning settings.

tac0turtle commented 2 years ago

It's also probably safer to disable all historical queries, since they depend on each node's pruning settings.

could you elaborate

amaury1093 commented 2 years ago

If your state machine code makes a historical query for height say N-1000, then:

causing a consensus error.

My safe solution above is to only allow state machine queries to query the latest state.

alexanderbez commented 2 years ago

I'm not sure I understand @amaurym. AFAIK, CW would not make any historical queries, but rather assumes the latest/current state.

I like @aaronc's proposal of a Proto annotation. This way devs/clients can inspect docs and know what queries are safe to use.

So I say we:

  1. Introduce and add annotations
  2. Write tests for determinisms for said queries (i.e. loop N times and ensure output is canonical each time)
amaury1093 commented 2 years ago

I creatd a first PR to make sure we agree on the proto annotation, and which queries will have the guarantees on determinism. https://github.com/cosmos/cosmos-sdk/pull/13174

amaury1093 commented 2 years ago

Write tests for determinisms for said queries (i.e. loop N times and ensure output is canonical each time)

How about using property-based testing with rapid, which the SDK already uses? So basically something like:

// Check GetBalance determinism
rapid.Check(func(t *testing.T) {
  addr := AddressGenerator().Draw(t) // where AddressGenerator returns a rapid.Generator[sdk.AccAddress]
  amt := rapid.Uint64().Draw(t)
  bankKeeper.SetBalance(ctx, addr, sdk.NewCoin(amt, "stake"))
  for i := 0; i < 1000; i++ {
    // make sure bankKeeper.GetBalance always returns the same response
  }
})

So there are 1000*N iterations, where N = our rapid's configuration. This way we don't hardcode the response.

Do we also agree these tests belong inside test/integration ? I.e not suited to be tested using mocks.

alexanderbez commented 2 years ago

I've never seen or used Rapid, but the approach seems reasonable to me.

tac0turtle commented 2 years ago

with rapid how do you verify that the response didnt change in minor releases?

amaury1093 commented 2 years ago

with rapid how do you verify that the response didnt change in minor releases?

Good point. So maybe we do one or three hardcoded responses (to check backwards-determinism) plus rapid (to check the property "query is deterministic"). I feel both are useful here since they test different things.

amaury1093 commented 2 years ago

FYI, the way we currently test determinism/regression across minor releases is weak. We basically test queries against hardcoded values. E.g. if we fund acct1 with balance2, then the query always return the same balance, see example.

With the above method, we might catch changes in the query response. But we wouldn't catch stuff like:

These are not trivial to catch though. Are people fine, for this epic, to do the following:

  1. keep the current hardcoded tests for regression
  2. add clear documentation for module developers that they need to be careful about those state-machine breaking changes on each PR, when they add module_query_safe proto annotation, it can easily be a footgun.

Also happy to hear other ideas on how to test state-machine breaking changes in queries.

alexanderbez commented 2 years ago

@amaurym I'm a bit lost as to how your points relate to non-determinism re queries:

server-side logic change (e.g. additional server-side validation of incoming request on the query handler)

How is validation breaking determinism? If some query previous didn't error on input but with a new change now does, is this what you mean? If so, tests should easily catch this (e.g. Error vs NoError assertions).

gas change

Why is this difficult? Can't we read the gas consumed off of the context?

amaury1093 commented 2 years ago

How is validation breaking determinism? If some query previous didn't error on input but with a new change now does, is this what you mean?

Yes. Imagine you (or any module developer in the future) create a PR on a deterministic query which:

Then our existing deterministic tests won't fail. Even though this is a state-machine breaking change. We would need to write new tests.

gas change

So you're proposing to hardcode the gas consumed by each query inside the test itself? Not sure how flakey this would be with our test setup, but we can try.

alexanderbez commented 2 years ago

Then our existing deterministic tests won't fail. Even though this is a state-machine breaking change. We would need to write new tests.

Yeah, makes sense. I suppose there isn't much we can do -- we just have to ensure we right good tests for new fields.

So you're proposing to hardcode the gas consumed by each query inside the test itself? Not sure how flakey this would be with our test setup, but we can try.

Yes, exactly! With queries, the gas should not be flakey since we're not simulating any state! In fact, we can improve these tests by also ensuring the gas consumed never changes per iteration.

amaury1093 commented 1 year ago

This Epic is complete 🎉.

As a summary, we added the (cosmos.query.v1.module_query_safe) = true protobuf annotation to Auth,bank, staking grpc queries that are allowed to be called from the state machine. Here's some documentation too.

tac0turtle commented 1 year ago

amazing, thank you!!! Nice job!!

Should we open a new epic to slowly migrate other modules as well?

amaury1093 commented 1 year ago

Not sure about all modules, there are still caveats like https://github.com/cosmos/cosmos-sdk/issues/13041#issuecomment-1273398931, so a consensus error probability is never 0.

Are there queries/modules that people particularly want? Let's do those first.

tac0turtle commented 1 year ago

@ValarDragon @ethanfrey @assafmo do you have queries you need to be deterministic from the core sdk modules

ethanfrey commented 1 year ago

Bank, Staking and Auth is a huge step forward 💪 Thank you for getting this in 🚀

I don't have any more pressing needs. I think gov would be interesting to some users, but I think it is also in a process of change and not in a place to produce such guarantees.

Maybe adding such deterministic query checks for x/distribution, which is often used with staking?

NB: We just merged support to let chains configure their own such whitelist. Will play together well with this work https://github.com/CosmWasm/wasmd/pull/1069

assafmo commented 1 year ago

We went over all of these queries and found them to be deterministic: https://github.com/scrtlabs/SecretNetwork/blob/4397f93b2/x/compute/internal/keeper/query_plugins.go#L163-L199

Actually we didn't find a query in the standard modules that isn't deterministic, but we excluded the ones with iterations (E.g. /cosmos.auth.v1beta1.Query/Accounts) to prevent spamming nodes with huge queries.

amaury1093 commented 1 year ago

We went over all of these queries and found them to be deterministic

Thanks for the list! @assafmo How did you check they were deterministic?

assafmo commented 1 year ago

By eye, I went over the code and made sure there's just storage access and no maps serializations