Closed i5heu closed 2 weeks ago
Ohhh wow i found the error... this is some forgotten semicolon level stuff:
if forgot to copy the chunk variables for the goroutines and their pointer got assigned a new value while the goroutines where running. This took me way to long to catch.
And also i tested all modes with up to 10000 concurrencies and it seams to work!
Maybe not so obvious but one needs to do this in a for loop with wb.Set too, i guess because wb.Set is only executed when wb.Flush() is called.
func (k *KeyValStore) BatchWriteChunk(chunks []buzhashChunker.ChunkData) error {
wb := k.badgerDB.NewWriteBatch()
defer wb.Cancel()
for _, chunk := range chunks {
chunk := chunk // important
atomic.AddUint64(&k.writeCounter, 1)
err := wb.Set(chunk.Hash[:], chunk.Data)
if err != nil {
log.Fatal("Error writing chunk: ", err)
return err
}
}
return wb.Flush()
}
This is even defined in the documentation
Hi as i read it should be fine to use badgerDB from multiple goroutines, but if i do this i get data corruptions in which data is either not stored or parts of value are missing, or even some unknown bytes are added to value.
I have written a test function to test it with my test data, more later. Here are some errors my test function is printing:
NewTransaction limitConcurrency 10
Update limitConcurrency 10
NewWriteBatch limitConcurrency 10
Update without goroutine
How to replicate:
execute following steps ... i have not tested it in a shell script but it should work (ChunkingChampions contains the exact test data i used)
now you are ready to go. if you look at
cmd/badgerDBTorture/main.go
you can change which insertion function (eg. NewTransaction / Update) should be used and how many worker should be spawnednow you can execute following command whenever you want to test it
rm ./tmp/* && go run cmd/badgerDBTorture/main.go