cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.22k stars 3.59k forks source link

Make the guarantee that the iavl tree is inserted with ordered keys in each version #13837

Open yihuang opened 1 year ago

yihuang commented 1 year ago

Summary

IAVL root hash depends on the insertion order. When we think about some alternative format for chain state like https://github.com/cosmos/cosmos-sdk/issues/13317, the change set in each version will be always sorted. And in practice, the IAVL store is always written at commit event by cache store of deliverState, the cache store always sort the keys before writing, this is good because it means we can recover the IAVL tree by replay the change sets recorded in the above format.

Problem Definition

The problem is can we make this guarantee and don't break it in the future, maybe change some interfaces as well.

Proposal

Make the guarantee that iavl tree is always written with keys in order for each version, and change some interfaces to reflect that, like remove the direct Set/Delete methods from uncached store, always go through a cache store which will sort the keys before writing. This is already the behavior in action, just want to set it in stone.

alexanderbez commented 1 year ago

I'll mention this here again since it seems to be relevant, but I personally would love to explore and argue the case that the entire CacheKV store isn't necessary at all. And we can push this invariant requirement directly down into IAVL itself.

yihuang commented 1 year ago

I'll mention this here again since it seems to be relevant, but I personally would love to explore and argue the case that the entire CacheKV store isn't necessary at all. And we can push this invariant requirement directly down into IAVL itself.

Do you mean we cache the writes inside iavl and delays the tree re-balancing until the commit time?

yihuang commented 1 year ago

CacheKV serves another purpose to create a snapshot for a scope and discard the writes if the scope failed.

alexanderbez commented 1 year ago

I would like to see IAVl essentially take care of this. I.e. cache/batch writes where nothing happens until Commit/Write is called.

tac0turtle commented 1 year ago

@alexanderbez could you create a proposal doc on how you see it working. Heard you mention remove cachkv store before but it's hard to imagine where this logic moves without a doc or walkthrough of your proposal.

alexanderbez commented 1 year ago

Yeah for sure! I have a rough idea in my head. It will involve breaking APIs, but I think it can be done. I'll draft up a proposal.

peterbourgon commented 1 year ago

+1 to @alexanderbez basically. IMO caching belongs at the KVStore layer rather than at the IAVL layer, but that's a detail I guess. The main point is that caching should be an implicit implementation detail, not a separate type or concept that needs to be explicitly known to callers. You want a single common interface interface e.g.

type KVStore interface {
    Set(key string, val []byte) error
    Get(key string) ([]byte, error)
    Del(key string) error
    Commit() error // maybe?
}

with a base implementation e.g.

type CoreKVStore struct {
    tree iavl.Tree // or whatever
}

func NewCoreKVStore(...) (*CoreKVStore, error) { ... }

and stuff like caching implemented as decorators e.g.

type CachingKVStore struct {
    base  KVStore
    cache map[string]whatever
}

func NewCachingKVStore(base KVStore, ...) (*CachingKVStore, error) { ... }

func (s *CachingKVStore) Set(key string, val []byte) error { 
    // set key=val in s.base
    // if that worked, set key=val in s.cache, with whatever invariants
}

func (s *CachingKVStore) Get(key string) ([]byte, error) { 
    // check cache first, with whatever invariants
    // if not cached, call s.base.Get and cache response
}

// etc.

so that code that uses a KVStore can just take a KVStore directly e.g.

func NewApp(kvs KVStore, ...) (*App, error) { ... }

and can safely assume that caching, or any other value-add behavior, is already satisfied by the value given to them.

alexanderbez commented 1 year ago

This is more or less how it works now. A caller never explicitly asks for a cached KVStore, they do however, ask to cache-wrap. I haven't thought about this in great detail, so I could totally be off the mark here. But the current implementation is a hoge-podge of random responsbilities and concerns that it warrants a serious consideration of it's intended goals and features.

peterbourgon commented 1 year ago

This is more or less how it works now. A caller never explicitly asks for a cached KVStore, they do however, ask to cache-wrap.

That's true at a high level, but the devil is in the details. The problem is that there isn't a single canonical KVStore interface which is widely satisfied, instead there is a hierarchy of

These interfaces are used basically arbitrarily, and callers can't make any simplifying assumptions about received values, in practice you have to do type assertions and interface upgrades to use the package effectively. Which is exactly as you say:

I haven't thought about this in great detail, so I could totally be off the mark here. But the current implementation is a hoge-podge of random responsbilities and concerns that it warrants a serious consideration of it's intended goals and features.

tl;dr: definitely can and should be improved

tac0turtle commented 1 year ago

could we open a new issue that encapsulates the dsicssion on store design?in the meantime could we look into this issue to see the scope of work needed, if its small, then lets solve this is.

cc @facundomedica if you want to pick it up

kocubinski commented 2 months ago

If IAVL is to support replay, it must record the change sets it applied per version in the order that it did. Whether those keys are lexically sorted or not does not seem relevant to me. It must be a feature of IAVL, not defined by it's consumer.