gnolang / gno

Gno: An interpreted, stack-based Go virtual machine to build succinct and composable apps + gno.land: a blockchain for timeless code and fair open-source.
https://gno.land/
Other
899 stars 377 forks source link

[chain] Audit the Gno.land storage layer #2445

Open zivkovicmilos opened 5 months ago

zivkovicmilos commented 5 months ago

Description

This task concerns scoping out and documenting (can be a single HackMD document, not the official documentation) the current Gno.land storage layer.

We utilize LevelDB for our embedded storage, but have no concrete optimizations for writes / reads. The first step to optimizing the storage layer is to exactly detail:

ajnavarro commented 4 days ago

How Things Are Stored

Key-Value Structure

Below is a categorization of the main types of keys:

Nuances Found in the Current Implementation

Inconsistent Logic

Similar types, such as PackageValue.Block and PackageValue.FBlocks, are stored differently despite representing the same Value interface. This inconsistency makes the system error-prone and harder to maintain.

Inefficient Storage Serialization

Approximately 80%-90% of the data stored is redundant due to the direct storage of VM objects in LevelDB. Additionally, keys for objects are repeatedly marshaled, adding unnecessary overhead.

Unused Keys and Uneeded Storage Layers

There are keys, like those for BlockNodes (e.g., []byte("node:" + loc.String())), that are defined but not used. PrefixDB adds prefixes to keys and uses a mutex internally, which introduces unnecessary complexity. Keys are defined in multiple ways—some use hashed package names useful to have keys with the same length, while others use package names.

VM Model and Storage Coupling

Models have methods that receive the Store as a parameter to retrieve related metadata. This creates inconsistencies in how types are retrieved/cast/used and adds an unnecessary dependency between the model and storage layers.

Slice Storage Limitations

Slice metadata is not split into chunks, which could cause memory issues when marshaling/unmarshaling large amounts of data when retrieving big slices.

Next Steps

Option 1: Maintain Status Quo with Minor Fixes

Incrementally address specific issues, such as unifying key prefix logic and standardizing type storage.

Option 2: Introduce a Conversion Layer (Recommended)

Define storage-specific models fulfilling the use case, independent of VM state models. Establish a conversion layer between VM state models and storage models, simplifying data storage and retrieval while reducing redundancy.

This approach will simplify relationships between parts of the application, facilitate the implementation of other VMs in the future, and make it easier to maintain and optimize the storage layer.

Conclusion

I recommend moving forward with Option 2, as it provides a consistent and future-proof storage design. This will help reduce redundancy and make the system easier to maintain.

Would appreciate feedback from the team regarding how possible you think this approach is, and what is the best option to follow now.