Scille / parsec-cloud

Open source Dropbox-like file sharing with full client encryption !
https://parsec.cloud
Other
268 stars 40 forks source link

Panic due to `ReadChunkOrBlockError::ChunkNotFound` during outbound sync when disk is full #8492

Open sentry-io[bot] opened 3 weeks ago

sentry-io[bot] commented 3 weeks ago

The issue

Sentry Issue: PARSEC3-LIBPARSEC-B

panic: not yet implemented
  File "panicking.rs", line 645, in rust_begin_unwind
  ?, in libparsec_client::workspace::transactions::outbound_sync::reshape_and_upload_blocks::{{closure}}
  ?, in libparsec_client::workspace::transactions::outbound_sync::outbound_sync_child::{{closure}}
  ?, in libparsec_client::monitors::workspace_outbound_sync::task_future_factory::{{closure}}::{{closure}}
  File "pthread_create.c", line 442, in start_thread
...
(21 additional frame(s) were not displayed)

My guess

The panic most likely comes from https://github.com/Scille/parsec-cloud/blob/master/libparsec/crates/client/src/workspace/transactions/outbound_sync.rs#L708

However this is something that should never occur given a manifest referencing a chunk is supposed to be stored in the local storage only after the chunk has itself been added to the local storage

I suspect the issue comes from the fact chunk and manifest database are currently two separate sqlite database without any atomicity constraint between them. Hence we end up with two journaling and flushing strategies, which could lead to: 1) the chunk insertion is accepted by sqlite db1. The chunk row is written to the db1's WAL log file 2) the manifest insertion is accepted by sqlite db2. Here again the manifest row is written to the db2's WAL log file 3) Parsec gets stopped, the stop operation closes each database 3a) db1 close cannot apply the changes in it WAL log file in the database given there is not enough space on disk 3b) db2 close can apply the change in it WAL log file since those changes are small enough to fit on disk 4) On next Parsec start, db1's WAL log file is destroyed (WAL doesn't guarantee durability but only consistency) ==> we ended up with a manifest in db2 referencing a non-existing chunk :/

see:

vxgmichel commented 3 weeks ago

I suspect the issue comes from the fact chunk and manifest database are currently two separate sqlite database without any atomicity constraint between them.

I don't get it, I thought we had a cache database and a data database, and that the manifests were stored in the data database. That means that local data should always be consistent right?