[chain] Audit the Gno.land storage layer

How Things Are Stored

Key-Value Structure

Below is a categorization of the main types of keys:

Object Data:
- s/_/oid:{HASH}:{SEQ}: Stores object content hashes plus marshaled Object data.
- s/_/oid:{HASH}:1#realm: Stores the Realm type, saving the oid hash + package name.
Type Information:
- s/_/tid:gno.land/p/demo/avl.Tree: Stores type definitions.
State Data:
- s/_/n{BYTES}: Saves information about the state, with logic found in nodedb.go.
Metadata:
- s/latest: Contains the latest version, which points to s/VERSION (a commitInfo type).
- s/_/last_header: Stores an abci.Header.

Nuances Found in the Current Implementation

Inconsistent Logic

Similar types, such as PackageValue.Block and PackageValue.FBlocks, are stored differently despite representing the same Value interface. This inconsistency makes the system error-prone and harder to maintain.

Inefficient Storage Serialization

Approximately 80%-90% of the data stored is redundant due to the direct storage of VM objects in LevelDB. Additionally, keys for objects are repeatedly marshaled, adding unnecessary overhead.

Unused Keys and Uneeded Storage Layers

There are keys, like those for BlockNodes (e.g., []byte("node:" + loc.String())), that are defined but not used. PrefixDB adds prefixes to keys and uses a mutex internally, which introduces unnecessary complexity. Keys are defined in multiple ways—some use hashed package names useful to have keys with the same length, while others use package names.

VM Model and Storage Coupling

Models have methods that receive the Store as a parameter to retrieve related metadata. This creates inconsistencies in how types are retrieved/cast/used and adds an unnecessary dependency between the model and storage layers.

Slice Storage Limitations

Slice metadata is not split into chunks, which could cause memory issues when marshaling/unmarshaling large amounts of data when retrieving big slices.

Next Steps

Option 1: Maintain Status Quo with Minor Fixes

Incrementally address specific issues, such as unifying key prefix logic and standardizing type storage.

Option 2: Introduce a Conversion Layer (Recommended)

Define storage-specific models fulfilling the use case, independent of VM state models. Establish a conversion layer between VM state models and storage models, simplifying data storage and retrieval while reducing redundancy.

This approach will simplify relationships between parts of the application, facilitate the implementation of other VMs in the future, and make it easier to maintain and optimize the storage layer.

We need to define storage models for:
- Package/realms metadata
- Gno files
- Relationships between packages
Implement efficient indexing to list available packages, avoiding O(n) complexity.
Design a static tree to define type relationships within a realm.
Create a Merkle tree to track value states and allow rollbacks if necessary.

Conclusion

I recommend moving forward with Option 2, as it provides a consistent and future-proof storage design. This will help reduce redundancy and make the system easier to maintain.

Would appreciate feedback from the team regarding how possible you think this approach is, and what is the best option to follow now.

gnolang / gno