codenotary / immudb

immudb - immutable database based on zero trust, SQL/Key-Value/Document model, tamperproof, data change history
https://immudb.io
Other
8.55k stars 343 forks source link

Question. Data or hash? #682

Closed srinisubramanian closed 3 years ago

srinisubramanian commented 3 years ago

What is the recommendation for immudb to store data or the hash of the data. I am not sure if there is a limit to the data in immudb (AWS QLDB I think restricts to 4MB).

Given that data cannot be deleted or archived what happens if data is stored (instead of hashes) and db size just keeps growing?

Not sure if this is already addressed elsewhere.

jeroiraz commented 3 years ago

Hi @srinisubramanian,

Storing only the hashes will drastically reduce storage requirements in immudb while still being able to fetch values by key or to scan by key-prefixes. However, immudb was designed to store more than hashes, values associated to keys are stored separately from transaction and indexing logs. Furthermore, all backing storage is treated as append-only files, thus once data is written it's never updated.

Data retention strategies as well as using cloud storage are under discussion, meanwhile depending on your use case, you may be able to split your data into multiple immudb databases, store hashes or the actual data if you are able to estimate the amount of data to be stored.

immudb uses grpc protocol and thus when not using streams, there is a maximum request size, current default value is 32mb. Next release will also include stream capabilities when bigger content needs to be stored.

Many thanks for your question :)

srinisubramanian commented 3 years ago

Great answer. Look forward to knowing the conclusions from your discussion on data archiving.