boltdb / bolt

An embedded key/value database for Go.
MIT License
14.14k stars 1.5k forks source link

Deleting items from a bucket is really really really slow. #667

Open timestee opened 7 years ago

timestee commented 7 years ago

Testing Code:

package boltdb

import (
    "fmt"
    "testing"
    "time"

    "github.com/boltdb/bolt"
)

var db *bolt.DB
var bucket_1 *bolt.Bucket
var bucket_2 *bolt.Bucket

func GetBucket(db *bolt.DB, bucket_name string) (b *bolt.Bucket, err error) {
    err = db.Update(func(tx *bolt.Tx) error {
        var err error
        b, err = tx.CreateBucketIfNotExists([]byte(bucket_name))
        return err
    })
    return
}

func Delete(db *bolt.DB, bucket_name string, key string) error {
    return db.Update(func(tx *bolt.Tx) error {
        bucket := tx.Bucket([]byte(bucket_name))
        return bucket.Delete([]byte(key))
    })
}

func Put(db *bolt.DB, bucket_name string, key string, val string) error {
    return db.Update(func(tx *bolt.Tx) error {
        bucket := tx.Bucket([]byte(bucket_name))
        return bucket.Put([]byte(key), []byte(val))
    })
}

func init() {
    db, _ = bolt.Open("/tmp/bolt_testing", 0600, nil)
    bucket_1, _ = GetBucket(db, "bucket_1")
    bucket_2, _ = GetBucket(db, "bucket_2")
}
func delete_0() {
    start := time.Now()
    bucket_1.Delete([]byte("key_1"))
    elapsed := time.Since(start)
    fmt.Println("================ delete_0", elapsed)
}
func delete_1() {
    start := time.Now()
    Delete(db, "bucket_1", "key_1")
    elapsed := time.Since(start)
    fmt.Println("================ delete_1", elapsed)
}
func put_0() {
    start := time.Now()
    bucket_1.Put([]byte("key_1"), []byte("val_1"))
    elapsed := time.Since(start)
    fmt.Println("================ put_0", elapsed)
}
func put_1() {
    start := time.Now()
    Put(db, "bucket_1", "key_1", "val_1")
    elapsed := time.Since(start)
    fmt.Println("================ put_1", elapsed)
}
func TestDelete(t *testing.T) {
    delete_0()
    delete_1()

    put_0()
    put_1()
}

The output running on SSD :

================ delete_0 240ns
================ delete_1 433.485µs
================ put_0 417ns
================ put_1 408.77µs

and on HDD:

================ delete_0 960ns
================ delete_1 25.030523ms
================ put_0 1.128µs
================ put_1 49.837821ms

But actually, we can not use bucket like delete_0 and put_0.

// Bucket retrieves a bucket by name.
// Returns nil if the bucket does not exist.
// The bucket instance is only valid for the lifetime of the transaction.
func (tx *Tx) Bucket(name []byte) *Bucket {
    return tx.root.Bucket(name)
}

Can we enhance the performance?

dtfinch commented 7 years ago

25ms is three rotations of a 7200rpm disk. It has to flush writes to disk to ensure they're written in the proper order, and this is a problem with any ACID-compliant database.

You can batch updates together into a single transaction. There's also a db.Batch() method, or you can manage transactions manually with db.Begin and tx.Commit or .Rollback.

On server hardware, a non-volatile raid cache solves the commit delays. It reports back to the OS that the flush completed immediately, because it can guarantee that writes are committed even if the power fails or the OS crashes.