dgraph-io / badger

Fast key-value DB in Go.
https://dgraph.io/badger
Apache License 2.0
13.44k stars 1.15k forks source link

[BUG]: Corrupt and lost data #2059

Closed i5heu closed 2 weeks ago

i5heu commented 2 weeks ago

Hi as i read it should be fine to use badgerDB from multiple goroutines, but if i do this i get data corruptions in which data is either not stored or parts of value are missing, or even some unknown bytes are added to value.

I have written a test function to test it with my test data, more later. Here are some errors my test function is printing:

NewTransaction limitConcurrency 10

....
2024/05/02 17:00:50 Error: The original data and the data from the keyValStore are not the same 524288 - 241300
2024/05/02 17:00:50 Error: The original data and the data from the keyValStore are not the same 137079 - 216851
2024/05/02 17:00:50 Error: The original data and the data from the keyValStore are not the same 192041 - 151706
2024/05/02 17:00:50 Error: The original data and the data from the keyValStore are not the same 485484 - 215200
2024/05/02 17:00:50 Error: The original data and the data from the keyValStore are not the same 193201 - 208072
2024/05/02 17:00:50 Error: The original data and the data from the keyValStore are not the same 202916 - 154629
not found chunks : 71.49071440522002 %  Chunks: 5977 Chunks not found: 4273

Update limitConcurrency 10

....
2024/05/02 17:01:53 Error: The original data and the data from the keyValStore are not the same 176866 - 327360
2024/05/02 17:01:53 Error: The original data and the data from the keyValStore are not the same 134327 - 143631
2024/05/02 17:01:53 Error: The original data and the data from the keyValStore are not the same 164591 - 196890
2024/05/02 17:01:53 Error: The original data and the data from the keyValStore are not the same 134327 - 143631
2024/05/02 17:01:53 Error: The original data and the data from the keyValStore are not the same 247492 - 152750
not found chunks : 70.92186715743685 %  Chunks: 5977 Chunks not found: 4239

NewWriteBatch limitConcurrency 10

....
2024/05/02 17:02:49 Error: The original data and the data from the keyValStore are not the same 383431 - 313805
2024/05/02 17:02:49 Error: The original data and the data from the keyValStore are not the same 164308 - 306034
2024/05/02 17:02:49 Error: The original data and the data from the keyValStore are not the same 186945 - 167126
2024/05/02 17:02:49 Error: The original data and the data from the keyValStore are not the same 140884 - 257570
2024/05/02 17:02:49 Error: The original data and the data from the keyValStore are not the same 140884 - 257570
2024/05/02 17:02:49 Error: The original data and the data from the keyValStore are not the same 191939 - 149949
not found chunks : 42.51296637108917 %  Chunks: 5977 Chunks not found: 2541

Update without goroutine

not found chunks : 0 %  Chunks: 5977 Chunks not found: 0

How to replicate:

execute following steps ... i have not tested it in a shell script but it should work (ChunkingChampions contains the exact test data i used)

git clone git@github.com:i5heu/OuroborosDB.git
cd OuroborosDB
git checkout bug-bagerdb
mkdir data
mkdir tmp
cd data
git clone git@github.com:i5heu/ChunkingChampions.git
cd ..
touch ./tmp/foo

now you are ready to go. if you look at cmd/badgerDBTorture/main.go you can change which insertion function (eg. NewTransaction / Update) should be used and how many worker should be spawned

now you can execute following command whenever you want to test it rm ./tmp/* && go run cmd/badgerDBTorture/main.go

i5heu commented 2 weeks ago

Ohhh wow i found the error... this is some forgotten semicolon level stuff:

if forgot to copy the chunk variables for the goroutines and their pointer got assigned a new value while the goroutines where running. This took me way to long to catch.

And also i tested all modes with up to 10000 concurrencies and it seams to work!

Maybe not so obvious but one needs to do this in a for loop with wb.Set too, i guess because wb.Set is only executed when wb.Flush() is called.

func (k *KeyValStore) BatchWriteChunk(chunks []buzhashChunker.ChunkData) error {

    wb := k.badgerDB.NewWriteBatch()
    defer wb.Cancel()

    for _, chunk := range chunks {
        chunk := chunk // important

        atomic.AddUint64(&k.writeCounter, 1)
        err := wb.Set(chunk.Hash[:], chunk.Data)
        if err != nil {
            log.Fatal("Error writing chunk: ", err)
            return err
        }
    }

    return wb.Flush()
}

This is even defined in the documentation

https://pkg.go.dev/github.com/dgraph-io/badger#Txn.Set