celestiaorg / smt

A Go library that implements a Sparse Merkle tree for a key-value map.
https://godoc.org/github.com/celestiaorg/smt
MIT License
138 stars 53 forks source link

Store hash(key) => value in the DB to reduce I/O operations for reads #14

Closed musalbas closed 3 years ago

adlerjohn commented 3 years ago

This should probably be index in tree -> value (and for leaves only) instead of hash(index in tree) -> value. The reason is that there could be specific entries in the state that are at specific locations, e.g., or the state could be laid out in sections. Hashing or not should be decided at the application side rather than required by the SMT.

liamsi commented 3 years ago

referencing #8

tzdybal commented 3 years ago

For now, SMT cannot be iterated (except iterating underlying map-store if possible, but it's a hack). If iteration is not necessary, I can't see a difference between hash(key)->value and key->value mapping (except hashing footprint). Currently, there is only hash(value)->value mapping, which has one additional feature: de-duplication of values. On the other hand, it complicates removal of orphans (because we can't remove hash(value) from mapstore without storing reference count for values somewhere.

adlerjohn commented 3 years ago

Deduplication at the cost of having to reference count is probably not a good property to have. How many accounts are going to have exactly the same balance and nonce, etc.

tzdybal commented 3 years ago

Traversing the tree to access the data, cryptographically ensures that value is actually present/absent in the tree.

Accessing by key, means that you can't actually trust values from the tree (for example after ImportSparseMerkleTree call), because map store can be modified, without "invalidating" root ("dangling" values can be added, values can be changed). After traversing a tree user can be sure that value corresponds to the root. Without it user has to prove it.

Another issue is the lack of "history". In current implementation, you can use old root to access old value for a key. After optimization, only latest version of the value will be returned.

Maybe we should use a bit more sophisticated form of mapping like: root+key->value? This will enable efficient access to previous versions of data and easy removal. And as you said, lack of de-duplication of values is not a big issue.