boltdb / bolt

An embedded key/value database for Go.
MIT License
14.19k stars 1.51k forks source link

Bolt doesn't trigger batches on close? #599

Open sbowman opened 8 years ago

sbowman commented 8 years ago

I'm calling "go db.Batch" to handle a large number of writes. I'm seeing errors in my code that the messages queued up in the batch aren't being flushed when I close bolt.DB. I don't see anything in the code calling db.batch.trigger() before closing the database file.

Is this a bug, or is there a way to flush the batch before I close?

pladdy commented 8 years ago

I believe I'm seeing the same problem. I set up a program to I call "go db.Batch" 1000 times and drain the channel once it's full. However I'm seeing my program exit without the batches finishing. Perhaps it's because the Batch function is also creating go routines? https://github.com/boltdb/bolt/blob/074dffcc83e9f421e261526d297cd93f22a34080/db.go#L671

Sorry if that's a dumb idea; I'm new to go.

chrsm commented 8 years ago

https://github.com/boltdb/bolt/blob/074dffcc83e9f421e261526d297cd93f22a34080/db.go#L404

There is no attempt to wait on anything that's currently in-progress when the DB is closed. Unfortunately there doesn't seem to be a way to be notified of the success of a batch, but I think that it's possible to do from user code here easily (chans signaling OK after they've been run).

pladdy commented 8 years ago

Gotcha; I'm going to try and do that. I originally thought I had that in my code but I wasn't waiting on the channel correctly. Basically my main go routine exited while there were batches still executing and they never got a chance to persist their data. Will post back with example code if I get it working.

pladdy commented 8 years ago

So looks like my problem was I was being clever... I created a struct to wrap boltdb functions. Things like myStruct.Take(key) to do a boltdb transaction and get the value back I have stored.

Anyway, when I was calling batch, I was calling with my wrapper functions. I even had a batch function. When I undid all that and made my batch function like this

    c <- store.Db.Batch(func(tx *bolt.Tx) error {
        bucket := tx.Bucket([]byte(store.Bucket))
        storedValue := bucket.Get([]byte(key))
        var err error

        if storedValue == nil {
            err = bucket.Put([]byte(key), []byte(value))
        } else {
            newValue := string(storedValue) + "\n" + value
            err = bucket.Put([]byte(key), []byte(newValue))
        }

        return err
    })

All the values I was batch writing got stored and got stored faster. I went from storing 9,104,134 writes in almost 1h43m down to 22m10s; about 5x faster.

benbjohnson commented 7 years ago

@sbowman You're right. It looks like DB.Close() should obtain the batchMu & trigger the batch before continuing. If you submit a PR and verify the fix then I can merge it in.

gbl08ma commented 7 years ago

I don't think DB.Close() should obtain the batchMu, as this means batch.run() will wait forever waiting to obtain batchMu. However, if the batch is triggered from DB.Close() but without acquiring the mutex, then I believe that there's the possibility that a different goroutine will add a batch transaction (that will never be committed) between the call to db.batch.trigger() and the moment that the database is effectively closed.

(This is still much better than the current situation where batch transactions are lost if the database is closed at an "unlucky" time, with a probability much higher than the hypothetical situation I described.)

tv42 commented 7 years ago

If you have active goroutines calling Update/View as you call Close, you just shot your own foot. Batch counts as Update, for that..

tv42 commented 7 years ago

Oh and to say something constructive: this is how you wait for your workers to complete: https://blog.golang.org/pipelines

sbowman commented 7 years ago

@tv42 Are you karma trolling? If you don't have a patch or something meaningful to contribute, why not leave the technical discussions to the big boys? Have you even tried to replicate this issue (your comments suggest no), or are you just blowing hot air trying to sound erudite?

I have a workaround for this I suppose I should post, but got pulled off onto other projects so haven't had a chance. Will dig up my code tomorrow and see if I can provide some alternatives if anyone is interested.

tv42 commented 7 years ago

@sbowman Wow. "Big boys", huh? Guess who wrote Batch. And have a nice day.

tv42 commented 7 years ago

The reason why I don't feel Close has any obligation to trigger Batch: it also has no obligation to wait for an Update that is just about to be called.

If you race calls to Update/Batch against Close, you should either see the update happen, or get ErrDatabaseNotOpen. That should hold regardless of whether you're calling Update or Batch.

Batch triggering by time some time after the Close is not distinguishable from /* called just before Close */ time.Sleep(epsilon); db.Batch(...).

sbowman commented 7 years ago

You do have an obligation to write before db.Close, because your batch functionality doesn't give me any control over when or if it completes.

Has absolutely nothing to do with me parallelizing my batches through a bunch of goroutines. Did that, waited for my goroutines to finish and WaitGroup to drain, and your code still left half a batch drifting into the ether.

Let me give you a proper scenario. Feel free to setup a test case and see what happens:

  1. Set your db.MaxBatchDelay to 1 minute.
  2. Set your db.MaxBatchSize to 10.
  3. Run 8 db.Batch calls sequentially.
  4. Close the database.

Did anything save?

Since you're the one to blame, if you in fact looked back at your code instead of giving me a link to Concurrency 101, you'll notice you don't execute any of my Batch calls until either you (a) hit a timeout or (b) reach a batch size limit. Here, let me help (I labeled them for you):

db.batchMu.Lock()
if (db.batch == nil) || (db.batch != nil && len(db.batch.calls) >= db.MaxBatchSize) {
    // There is no existing batch, or the existing batch is full; start a new one.
    db.batch = &batch{
        db: db,
    }
    db.batch.timer = time.AfterFunc(db.MaxBatchDelay, db.batch.trigger)  // <-- (a)
}
db.batch.calls = append(db.batch.calls, call{fn: fn, err: errCh})
if len(db.batch.calls) >= db.MaxBatchSize {  // <-- (b)
    // wake up batch, it's ready to run
    go db.batch.trigger()
}
db.batchMu.Unlock()

How am I supposed to tell the batch to run when I'm done with all my db.Batch calls? Since I can't tell the batch to run or trigger (you made that functionality private), nor do I know whether or not the batch trigger call even completed flushing to disk (since it's run in a goroutine itself), please tell me how it's my obligation to somehow magically know when I can close the database?

Also let me just say, in general if you launch a goroutine deep in your code, it's not my responsibility to make sure it completes. It's yours.

If I recall (don't have my code on this laptop), I tricked your batch code into firing by resetting the db.MaxBatchSize to 1 then applied some meaningless batch call so the db.batch.trigger would trip and actually run. Then I called time.Sleep for five seconds or so and crossed my fingers that everything had time to flush to disk and I could finally, somewhat reliably, call db.Close.

It seems to me the only reliable solution to this problem is to check the batch size in db.Close and if necessary run the batch and specifically wait for it to finish before finally closing the database. If you've got another solution, by all means let's hear it.

gbl08ma commented 7 years ago

Just by looking at the API docs and Readme information for db.Batch, it is not clear at all that batched transactions might never execute - the only warning is that "the function passed to Batch may be called multiple times, regardless of whether it returns error or not" and so the functions must be idempotent.

Nowhere does it say that the functions passed may never execute at all if db.Close is called at a point where the internal state of bolt (which users of the library should not have to care about, especially if it's not on the GoDoc!) causes batch writes to be lost for one reason or another. In fact, I only found out this problem existed because as I was testing my program, somehow certain things were not being saved even on completely clean exits, and decided to look for issues on GitHub.

There is also no mechanism to trigger batch writes manually (something that could be useful even outside of this db.Close scenario), so there's not much to be done except play with the internal state of Bolt (which I'm relatively sure is not something people are meant to rely on...) to trigger Batch writes.

sbowman commented 7 years ago

After picking through the batch code, I think I finally get what @tv42 intended, but it's not obvious and I'm not sure what db.Batch buys me anymore.

It looks like he wants us to create a bunch of goroutines to run the batches and then wait for the db.Batch call to return before returning from the goroutine. Something like this:

var wg sync.WaitGroup

for i := 0; i < 100; i++ {
    wg.Add(1)
    go func() {
        defer wg.Done()

        // Do some work in parallel...

        // *** This will block until the batch is finally run ***
        err := db.Batch(func(tx *bolt.Tx) error {
            // Do something to BoltDB in here...
        })

        if err != nil {
            fmt.Printf("Got an error: %s\n", err)
        }
    }()
}

wg.Wait()
db.Close()

So your process will hang until the batch reaches its max or the timeout is hit and the write is triggered, which means your WaitGroup will pause until all batches are processed.

In my research I did notice the db.Batch call caches the func you pass it in an array and runs those through db.Update one at a time once a "trigger" is hit (timeout or full cache). If one call fails, it's pulled out and the batch continues. The failed call is then run separately at the end (not sure why it would succeed the second time if it didn't the first). So then doesn't just replacing db.Batch with db.Update in the above code perform the exact same function (and maybe even faster)?

I assumed two things about batches: I could hand off the batch function and forget about it, and that the batch functions would all run in a single transaction for better write performance. If this isn't the case, is there a benefit to db.Batch that I'm missing?

In any case, maybe just update the README and Godocs to better explain the existing db.Batch functionality. The README example right now is about fetching a value, and the Godocs just say something like "run the db.Batch in a goroutine for best performance".

tv42 commented 7 years ago

Set your db.MaxBatchDelay to 1 minute.

That's a really long time. It would be most appropriate if your disk took 30 seconds for a single write operation. Don't do that, and you won't have the problem.

The point of Batch is when you have multiple goroutines that would have called Update, now their disk syncs can be combined.

I have no idea what you're trying to do here, and your example code has a goroutine without any reason to exist, that is immediately waited on, so it's exactly the same as if there was no extra goroutine used there.

I assumed two things about batches: I could hand off the batch function and forget about it, [...]

It returns an error, why would you want to forget about it! You wouldn't go db.Update(...), I hope.

tv42 commented 7 years ago

@gbl08ma functions passed to Batch execute under normal circumstances. @sbowman's problem is caused by his program exiting while it still has work pending. To avoid losing work, don't exit while you have work pending -- it doesn't matter whether that work is calling Update or Batch.

sbowman commented 7 years ago

I misread the code around running batches as a group under a single transaction; looks like that is what a batch will do for you. Still not really keen on how batch.run doesn't fail but instead will partially apply then try to run failed batch functions individually. That's not typically what I'd expect from a transaction, but since there are strict documented limitations on what you should put in batch functions, I suppose it's ok. If I want to fail en masse it's not hard to write my own batch processor.

@gbl08ma if you write your batch calls like my example above, your app should hang around until the batch finishes. Depending on your timeout, it could take a few seconds to quit, but it will finish. You may want to capture the kill signals to make sure it does wait if you hit Ctrl-C. Also make sure you don't call os.Exit anywhere.

@tv42 my examples were meant to be just that: examples. IRL I wouldn't have a one minute timeout, but it's helpful for testing. And I'd have functionality in my goroutines that would run in parallel and eventually make the db.Batch call. I updated my code sample so that might make more sense. I would still urge that the README and Godocs be updated with a better example. There were enough people on this thread that didn't quite understand the expectations of db.Batch and how it worked. I suppose you can write them off as just stupid or you can accept that the documentation isn't perhaps as clear as it could be.

sbowman commented 7 years ago

For the record, here's how I expected batches to work (hence the confusion):

go db.Batch(...)
go db.Batch(...)
// etc..

// Hangs until batch syncs...
if err := db.Close(); err != nil {
    fmt.Printf("Failed to save batch! %s\n", err)
}
tv42 commented 7 years ago

go db.Batch(...) throws away errors, don't do that.