Open zivkovicmilos opened 5 months ago
Below is a categorization of the main types of keys:
Object Data:
s/_/oid:{HASH}:{SEQ}
: Stores object content hashes plus marshaled Object
data.s/_/oid:{HASH}:1#realm
: Stores the Realm
type, saving the oid hash + package name.Type Information:
s/_/tid:gno.land/p/demo/avl.Tree
: Stores type definitions.State Data:
s/_/n{BYTES}
: Saves information about the state, with logic found in nodedb.go
.Metadata:
s/latest
: Contains the latest version, which points to s/VERSION
(a commitInfo
type).s/_/last_header
: Stores an abci.Header
.Similar types, such as PackageValue.Block
and PackageValue.FBlocks
, are stored differently despite representing the same Value
interface. This inconsistency makes the system error-prone and harder to maintain.
Approximately 80%-90% of the data stored is redundant due to the direct storage of VM objects in LevelDB. Additionally, keys for objects are repeatedly marshaled, adding unnecessary overhead.
There are keys, like those for BlockNodes
(e.g., []byte("node:" + loc.String())
), that are defined but not used. PrefixDB
adds prefixes to keys and uses a mutex internally, which introduces unnecessary complexity. Keys are defined in multiple ways—some use hashed package names useful to have keys with the same length, while others use package names.
Models have methods that receive the Store as a parameter to retrieve related metadata. This creates inconsistencies in how types are retrieved/cast/used and adds an unnecessary dependency between the model and storage layers.
Slice metadata is not split into chunks, which could cause memory issues when marshaling/unmarshaling large amounts of data when retrieving big slices.
Incrementally address specific issues, such as unifying key prefix logic and standardizing type storage.
Define storage-specific models fulfilling the use case, independent of VM state models. Establish a conversion layer between VM state models and storage models, simplifying data storage and retrieval while reducing redundancy.
This approach will simplify relationships between parts of the application, facilitate the implementation of other VMs in the future, and make it easier to maintain and optimize the storage layer.
We need to define storage models for:
Implement efficient indexing to list available packages, avoiding O(n) complexity.
Design a static tree to define type relationships within a realm.
Create a Merkle tree to track value states and allow rollbacks if necessary.
I recommend moving forward with Option 2, as it provides a consistent and future-proof storage design. This will help reduce redundancy and make the system easier to maintain.
Would appreciate feedback from the team regarding how possible you think this approach is, and what is the best option to follow now.
Description
This task concerns scoping out and documenting (can be a single HackMD document, not the official documentation) the current Gno.land storage layer.
We utilize LevelDB for our embedded storage, but have no concrete optimizations for writes / reads. The first step to optimizing the storage layer is to exactly detail: