codenotary / immudb

immudb - immutable database based on zero trust, SQL/Key-Value/Document model, tamperproof, data change history
https://immudb.io
Other
8.62k stars 343 forks source link

Data retention doesn't work as expected #1723

Closed lorepas closed 1 year ago

lorepas commented 1 year ago

What happened Version 1.5.0 should bring the new functionality about data retention, as reported here. However, after 24 hours, we are able to see logs about the transactions involved in the truncation, but the database growth remains the same.

We inserted a simple key-value couple as follow:

This for x from 0 to 10000, so we have inserted 10k of records as the one described above.

Our doubt is that value weight is very light and the storage we continue to see is the one about other structure involved in the db (Btree and Merkel Tree).

What you expected to happen

The expectation is that 24 hours we see a database size decrease.

How to reproduce it (as minimally and precisely as possible) Our steps are as follow:

URL = "localhost:3322" # immudb running on your machine LOGIN = "immudb" # Default username PASSWORD = "immudb" # Default password DB = b"testdb" # Default database name (must be in bytes)

def main(): client = ImmudbClient(URL) client.login(LOGIN, PASSWORD, database = DB) for i in range(10000): key_str = "TEST-" + str(i) value_str = "Trial Test " + str(i) client.set(bytes(key_str,'utf-8'), bytes(value_str,'utf-8')) if name == "main": main()

- The day after you should see logs like the one below but database size remains the same:

immudb2023/06/27 12:57:08 INFO: start truncating database 'testdb' {ts = 2023-06-26 00:00:00 +0200 CEST} immudb2023/06/27 12:57:08 INFO: copying sql catalog before truncation for database 'testdb' at tx 8 immudb2023/06/27 12:57:08 INFO: 1 transaction/s to be indexed at '/var/lib/immudb/testdb' immudb2023/06/27 12:57:08 INFO: committed sql catalog before truncation for database 'testdb' at tx 10053 immudb2023/06/27 12:57:08 INFO: running truncation up to transaction '8' immudb2023/06/27 12:57:08 INFO: running truncation check between transaction '8' and '10053' immudb2023/06/27 12:57:08 INFO: truncating vlog '1' at offset '231'


**Environment**
- OS: Ubuntu 20.04
- ImmuDB: v1.5.0

**Additional info (any other context about the problem)**
Here I'm attaching the dashboard about Database Size and Database Growth observed:
![immagine](https://github.com/codenotary/immudb/assets/56199352/f79cc7a6-ddc8-4822-b294-d4e168afb086)

As we can see, database size doesn't seem to present a decrease.
jeroiraz commented 1 year ago

thanks a lot @lorepas for the detailed description.

Currently, truncation is only done on value log files e.g. files under val_0 folder. And files are physically deleted once the entire content is subject to be erased.

You may specify the file size at database creation time, default size is 512MB but it can be lowered and it may be faster to check that files are deleted once truncation process runs

lorepas commented 1 year ago

Thank you very much @jeroiraz about your clarification. I'm trying to specify a lower file size in order to see the expected behavior, however I'm not able to find the right command to do this. From documentation here it seems that FileSize cannot be change once the database has been already created. I saw also that database settings can be changed only with Go API and it's not possible to launch database settings command by CLI (e.g. with immuadmin). Is it correct?

jeroiraz commented 1 year ago

Thank you very much @jeroiraz about your clarification. I'm trying to specify a lower file size in order to see the expected behavior, however I'm not able to find the right command to do this. From documentation here it seems that FileSize cannot be change once the database has been already created. I saw also that database settings can be changed only with Go API and it's not possible to launch database settings command by CLI (e.g. with immuadmin). Is it correct?

That's right, the fileSize can not be changed after database creation. It should be possible to specify the size using the sdk or when creating the database with immuadmin e.g. ./immuadmin database create db1 --file-size=10485760 10MB

Inserting more entries or entries with bigger values will also fill up the first file under val0 folder

lorepas commented 1 year ago

Thank you very much @jeroiraz now everything it's clear! I've tried to change file size by using immuadmin, however it doesn't seem to be a valid option: immagine

At this point, I think the only way to change file size is by using Go sdk.

jeroiraz commented 1 year ago

Thank you very much @jeroiraz now everything it's clear! I've tried to change file size by using immuadmin, however it doesn't seem to be a valid option: immagine

At this point, I think the only way to change file size is by using Go sdk.

sure.

Just checked, indeed file-size flag is not currently added in immuadmin. Thus, as you mention the sdk will be required

lorepas commented 1 year ago

Just checked, indeed file-size flag is not currently added in immuadmin. Thus, as you mention the sdk will be required

Thank you for the clarification!

jeroiraz commented 1 year ago

Just checked, indeed file-size flag is not currently added in immuadmin. Thus, as you mention the sdk will be required

Thank you for the clarification!

sure. Please close this issue once you are able to check truncation is working. And if it doesn't work as expected we'll fix it asap. Thanks!

lorepas commented 1 year ago

Hi @jeroiraz . I'm replying after some days because I wanted to check the effectiveness about data retention deletion. In my test case, I've used an S3 bucket (MinIO) to store ImmuDB data. It seems that the deletion occurs on data stored in local db (the one in /var/lib/immudb) but the data stored in S3 bucket are not affected about that. Do you confirm that or the deletion should work in S3 storage too?

jeroiraz commented 1 year ago

Hi @jeroiraz . I'm replying after some days because I wanted to check the effectiveness about data retention deletion. In my test case, I've used an S3 bucket (MinIO) to store ImmuDB data. It seems that the deletion occurs on data stored in local db (the one in /var/lib/immudb) but the data stored in S3 bucket are not affected about that. Do you confirm that or the deletion should work in S3 storage too?

Hi @lorepas, that's correct. Truncation is not implemented when using remote storage. Please feel free to open a feature request for it

lorepas commented 1 year ago

Hi @jeroiraz . I'm replying after some days because I wanted to check the effectiveness about data retention deletion. In my test case, I've used an S3 bucket (MinIO) to store ImmuDB data. It seems that the deletion occurs on data stored in local db (the one in /var/lib/immudb) but the data stored in S3 bucket are not affected about that. Do you confirm that or the deletion should work in S3 storage too?

Hi @lorepas, that's correct. Truncation is not implemented when using remote storage. Please feel free to open a feature request for it

Many thanks for your time and for your replies @jeroiraz !