etcd-io / bbolt

An embedded key/value database for Go.
https://go.etcd.io/bbolt
MIT License
8.32k stars 645 forks source link

(help) random panic/fatal errors on batch op #851

Closed freitzzz closed 3 weeks ago

freitzzz commented 3 weeks ago

Hi

I'm considering using bbolt on my project, but after playing around with concurrency, I noticed the library panics a lot on concurrent calls. This is the sample code I'm using:

//Insert a lot of data
    db.Batch(func(tx *bolt.Tx) error {
        b := tx.Bucket([]byte("MyBucket"))
        ch := make(chan (int), 50)
        for i := 0; i < cap(ch); i++ {
            go func() {
                uuid := models.UUID()
                err := b.Put([]byte(uuid), binary.LittleEndian.AppendUint64([]byte{}, uint64(i)))
                if err != nil {
                    fmt.Printf("put err, %v", err)
                }

                ch <- 1
            }()
        }

        for i := 0; i < cap(ch); i++ {
            <-ch
        }

        return nil
    })

Some of the panic errors include:

fatal error: concurrent map writes
runtime error: slice bounds out of 
put: zero-length new key
page 2 already freed

Since I'm doing 50 concurrent calls, I assume I must be doing seriously wrong.

Can someone help me?

ahrtr commented 3 weeks ago

You need to execute it synchronously. The bucket instance is only valid during the lifecycle of the transaction.

            go func() {
                uuid := models.UUID()
                err := b.Put([]byte(uuid), binary.LittleEndian.AppendUint64([]byte{}, uint64(i)))
                if err != nil {
                    fmt.Printf("put err, %v", err)
                }

                ch <- 1
            }()
freitzzz commented 3 weeks ago

@ahrtr If I make it synchronous, what's the point of allowing concurrency in the batch op?

Concurrent Batch calls are opportunistically combined into larger transactions. Batch is only useful when there are multiple goroutines calling it.

freitzzz commented 3 weeks ago

Or maybe I read it wrong. Is it multiple calls inside a batch call OR multiple calls of batch?

ahrtr commented 3 weeks ago

Please read https://github.com/etcd-io/bbolt?tab=readme-ov-file#batch-read-write-transactions

The purpose of DB.Batch is to minimize the overhead of commit by combining multiple updates.