PoloDB / PoloDB

PoloDB is an embedded document database.
https://www.polodb.org/
Apache License 2.0
923 stars 49 forks source link

Deletions not persisting to disk #127

Closed kplaceholder closed 8 months ago

kplaceholder commented 1 year ago

Hello, I'm having a rather frustrating issue that I'm positive is not working as intended. I am using polodb_core version 4.4.0.

I can create a db file with Database::open_file(), I can create a collection, I can insert documents to it, I can search them and I can update them. All of these operations seem to be written to disk properly, as in, the next run of the program does have the created/updated documents.

However! This is not the case with deletions! When I call either delete_one or delete_many, the selected documents are deleted from memory as expected —I can verify they are no longer there by calling find afterwards, so delete_one being called with the wrong filters is not a possibility—. But in the next run of the program, the deleted documents are back, suggesting that the delete operation has not persisted:

use polodb_core::{Database, Document, bson::doc};

let db = Database::open_file("tasks.db").unwrap();
let col = db.collection::<Document>("tasks");

// Run #1
col.insert_one(doc! { "name": "t1" })?;
col.find(None)?; // The document { "name": "t1" } is returned
// Run #2
col.delete_one(doc! { "name": "t1" })?;
col.find(None)?; // No documents returned
// Run #3
col.find(None)?; // The document { "name": "t1" } is returned, but none should be returned instead

In fact, toying around with doing inserts and deletes I found out that they also kinda not work together as expected, but I'm not following the logic behind what persists and what doesn't:

// Run #4
col.delete_one(doc! {"name": "t1"})?;
col.insert_one(doc! {"name": "t2"})?; // Will inserting a different document force the deletion to commit?
col.find(None)?; // Only the document {"name": "t2"} is returned
// Run #5
col.find(None)?; // Both documents {"name": "t1"} and {"name": "t2"} are returned, only the latter should exist

This issue makes PoloDB largely unreliable for my purposes, since I don't really know the extent of this issue and wether there are more cases that don't persist. I have been researching all causes I could come accross. I ensured that my filesystem is not interfering. I think this is probably a bug.

I could not find any document explaining how I should proceed about reporting bugs, so if I can provide additional information or report this somewhere else, please let me know.

AstroPatty commented 1 year ago

I am seeing identical behavior in my application. The database reports that the deletion was successful, and the deletion does appear in the database in memory, but does not persist to the copy of the database stored on disk.

Here is a minimum working test that demonstrates the behavior. Run with cargo test after directly compiling from source. Note this is not meant to be any sort of actually good test, just demonstrate that the last one fails.

Note, these tests are being run with the --test-threads 1 flag to ensure the tests are being run sequentially.


#[cfg(test)]
mod cd_tests {
    use polodb_core::{Database, Collection};
    use polodb_core::bson::{doc};
    use serde::{Deserialize, Serialize};

    #[derive(Debug, Serialize, Deserialize)]
    struct test_struct {
        name: String
    }

    #[test]
    fn test_add() {
        let db = Database::open_file("/Users/patrick/test_database.db").unwrap();
        let collection = db.collection("test");
        collection.insert_one(doc! { "name": "John Doe" }).unwrap();
    }

    #[test]
    fn test_remove() {
        let db = Database::open_file("/Users/patrick/test_database.db").unwrap();
        let collection: Collection<test_struct> = db.collection("test");
        collection.delete_one(doc! { "name": "John Doe" }).unwrap();
    }

    #[test]
    fn test_update() {
        let db = Database::open_file("/Users/patrick/test_database.db").unwrap();
        let collection: Collection<test_struct> = db.collection("test");
        assert!(collection.count_documents().unwrap() == 0)
    }
}

It's not clear to me yet if this also is the case for delete_many, however delete_many does seem to at least fail when it only finds one match.

I am working to see if I can trace the error. So far, I've found two (possible) issues.

  1. In polodb_core.lsm.lsm_kv.LSMKvInner the function should_sync returns false when the only change is a single deletion.

  2. In force_sync_last_segment (which is called when the database is dropped) the mem_table.len() returns 0 under the same conditions as above, which terminates the function before anything is sent to the file backend.

I'm just looking through the codebase for the first time, so maybe I'm just not understanding what each of the components is supposed to do. I will come back to it and see if I can find a working solution.

LonelyPale commented 1 year ago

I also encountered this problem

maxpowel commented 1 year ago

Apart from this, if you open a file with data and insert some data, the only data removed is the "fresh" data inserted, the old data loaded from the the file keeps untouched. You have to call delete many times. And doing this operation, sometimes it just hangs using 100% of cpu and corrupting the database (the only solution is to remove the file and recreate it from scratch).

vincentdchan commented 8 months ago

@maxpowel Can you provide a minimal example?

vincentdchan commented 8 months ago

Fixed in #144 , release 4.4.1