dashpay / grovedb

Storage solution with proofs and secondary indices.
MIT License
30 stars 17 forks source link

Please add more examples or use cases #303

Open pragmaxim opened 2 months ago

pragmaxim commented 2 months ago

Hey,

I found grovedb extremely useful however I have troubles to tell if it fits certain use cases, for instance now I'm trying to spike a data model to index whole Bitcoin into where I could use secondary indexes and SumTrees, just to spare some disk space and to have address balances automatically summed. Think of it as decentralized blockchain explorer.

I have a few concerns :

  1. often I see that I would need tx_id to be both SumItem and a Reference at the same time
  2. performance ... that the distribution of transactional data might not be a good fit for grovedb
  3. references here does not make much sense as we just persist small hashes
  4. I am not sure if Trees can be used as an alternative to composite keys as we know from Cassandra to avoid data duplication. Ie. if we can have billions of tx_id trees that each contain items or references.

Would you please give me some hints and suggestions as to this model? Overall I have a feeling that groveDB excels only in use cases when we store some "bigger" objects that we can reference by secondary indexes.

https://docs.google.com/document/d/1dql0MTMeu1-3PE_1CtSCc9mtHCi4PD8jhOZ7Ta_sTVQ/edit?usp=sharing Screenshot from 2024-06-20 15-49-18

QuantumExplorer commented 2 months ago

Sorry for not responding sooner, I hadn't seen your message. And we are releasing Dash Platform this month so I won't be able to go into great detail here.

GroveDB is extremely useful for provable data, where you can put data in the database and then have a merkle-ized proof of any data that you wish to query.

Indeed the sum tree usage here can be quite interesting as you can always verify that there is no inflationary issue.

As for performance, I wouldn't worry too much, in tests grovedb has been extremely fast. It's not going to be as fast as some tailored solutions, but I wouldn't worry all that much unless you are seeing a problem. Under the hood it uses rocksdb, and unless you are using proofs it uses an abstraction to not have to use the merkle trees and instead query directly to the underlying rocksdb. Now rocksdb isn't the most performant for reads, but I still wouldn't worry unless you are actually seeing issues.

The point of references is that they point to other data, so that if that data were to change it would still be pointing to it. Currently references are not bidirectional, so you need to manually know all your references and update them when you update data. While this seems like a very weird way to use them it's actually for performance. They were built for use cases of Dash Platform (which is why we built this database).

In your use case of non modifiable data they would be useful if the data you are pointing to is large, otherwise I would not use them as it takes more seeks to the database.

I am not exactly sure how composite keys work in cassandra, but in grovedb you can query and combine tree paths to get your data, hence getting secondary index type functionality.