codenotary / immudb

immudb - immutable database based on zero trust, SQL/Key-Value/Document model, tamperproof, data change history
https://immudb.io
Other
8.52k stars 337 forks source link

Immudb instance doesn't load data from S3 storage after reboot (randomly, discarding snapshots and other data) #1856

Closed MaksymVynohradovDA closed 3 months ago

MaksymVynohradovDA commented 8 months ago

What happened We host Immudb as AWS Fargate task (literally - docker container). As back-end storage we use S3 bucket as storage. When container/task restarted Immudb doesn't load existed data from storage, it become just empty. But in the same time looks like it's continue update storage.

What you expected to happen After reload Immudb instance should load data from S3 storage.

How to reproduce it (as minimally and precisely as possible)

  1. Start Immudb instance that connected to the AWS S3 storage.
  2. Create table and write some date to it, wait while this data will be persisted to S3
  3. Reload Immudb instance
  4. Check that data loaded or not from S3

Environment

immudb: v1.5.0 (git RC1)
webconsole: v1.0.18 (git ebf53ef)

Additional info (any other context about the problem)

MaksymVynohradovDA commented 8 months ago

Update: according to the logs we found next:


immudb 2023/11/03 12:15:47 INFO: Index '/var/lib/immudb/defaultdb/index' {ts=0, discarded_snapshots=1} successfully loaded
immudb 2023/11/03 12:15:47 INFO: Discarding snapshots due to invalid checksum at '/var/lib/immudb/defaultdb/index'
...
immudb 2023/11/03 12:15:48 INFO: tx data is corrupted: ALH mismatch at tx 14356323323871488: discarding pre-committed transaction: 1

And a lot of other mentions about "discarded" data Then after some time immudb starts empty db:

immudb 2023/11/03 12:15:47 INFO: Started with an empty default database

So why it may happen and how to fix this?

jeroiraz commented 8 months ago

Update: according to the logs we found next:


immudb 2023/11/03 12:15:47 INFO: Index '/var/lib/immudb/defaultdb/index' {ts=0, discarded_snapshots=1} successfully loaded
immudb 2023/11/03 12:15:47 INFO: Discarding snapshots due to invalid checksum at '/var/lib/immudb/defaultdb/index'
...
immudb 2023/11/03 12:15:48 INFO: tx data is corrupted: ALH mismatch at tx 14356323323871488: discarding pre-committed transaction: 1

And a lot of other mentions about "discarded" data Then after some time immudb starts empty db:

immudb 2023/11/03 12:15:47 INFO: Started with an empty default database

So why it may happen and how to fix this?

We'll review it asap.

Some data seems to be loaded and thus the mismatch described in the logs. Non-fully committed transactions may be discarded as the client shouldn't have received any confirmation.

MaksymVynohradovDA commented 8 months ago

@jeroiraz

Hi! Thanks a lot! It happens when the immudb container crushed/reload by some reason (like automatically by AWS Fargate). Just my assumption that files on S3 updates not all-in-once but one-by-one or even batches. And in this case immudb can't process it completely before reload, therefore signature become invalid... it's just my assumption =) But, anyway reload of service it's quite common process.

MaksymVynohradovDA commented 8 months ago

Hi! We investigated the issue. Steps to reproduce:

The rootcause that from the one side ImmuDB anyway host some recent data files (like tx or ay else) on the Docker conatiner (in our case - Fargate site) File system and from another side - instead docker or docker-compose - AWS Fargate create completly new instance of the tasks . Old data (volume) is just vanished without any way to restore them.

So, looks like it impossible to use ImmuDB running it on AWS Fargate and to be sure that data will not be lost after Fargate task crushes.

jeroiraz commented 8 months ago

Hi! We investigated the issue. Steps to reproduce:

  • Run ImmuDB on AWS Fargate, connected to S3 as storage
  • Reach memory limit due back-up restore or/and a lot of simultaneous queries to DB
  • Fargate task will be moved to the DRAINED status and then restarted.
  • New Fargate task will be running with error described in this issue and then "Started with an empty default database"

The rootcause that from the one side ImmuDB anyway host some recent data files (like tx or ay else) on the Docker conatiner (in our case - Fargate site) File system and from another side - instead docker or docker-compose - AWS Fargate create completly new instance of the tasks . Old data (volume) is just vanished without any way to restore them.

So, looks like it impossible to use ImmuDB running it on AWS Fargate and to be sure that data will not be lost after Fargate task crushes.

Currently, immudb asynchronously upload data to S3, so as you describe data may be lost in such cases. Replication could be used to mitigate this scenarios but master election is not implemented so it may require manual intervention or an external tooling to determine the best instance. @SimoneLazzaris you may be able to extend in this aspect.

Another possibility is to implement a synchronous operation mode when using S3 but a noticeable performance degradation would be expected. Nevertheless it seems a nice capability for use cases were degraded performance is still acceptable.