Open sbowman opened 8 years ago
I believe I'm seeing the same problem. I set up a program to I call "go db.Batch" 1000 times and drain the channel once it's full. However I'm seeing my program exit without the batches finishing. Perhaps it's because the Batch function is also creating go routines? https://github.com/boltdb/bolt/blob/074dffcc83e9f421e261526d297cd93f22a34080/db.go#L671
Sorry if that's a dumb idea; I'm new to go.
https://github.com/boltdb/bolt/blob/074dffcc83e9f421e261526d297cd93f22a34080/db.go#L404
There is no attempt to wait on anything that's currently in-progress when the DB is closed. Unfortunately there doesn't seem to be a way to be notified of the success of a batch, but I think that it's possible to do from user code here easily (chans signaling OK after they've been run).
Gotcha; I'm going to try and do that. I originally thought I had that in my code but I wasn't waiting on the channel correctly. Basically my main go routine exited while there were batches still executing and they never got a chance to persist their data. Will post back with example code if I get it working.
So looks like my problem was I was being clever... I created a struct to wrap boltdb functions. Things like myStruct.Take(key) to do a boltdb transaction and get the value back I have stored.
Anyway, when I was calling batch, I was calling with my wrapper functions. I even had a batch function. When I undid all that and made my batch function like this
c <- store.Db.Batch(func(tx *bolt.Tx) error {
bucket := tx.Bucket([]byte(store.Bucket))
storedValue := bucket.Get([]byte(key))
var err error
if storedValue == nil {
err = bucket.Put([]byte(key), []byte(value))
} else {
newValue := string(storedValue) + "\n" + value
err = bucket.Put([]byte(key), []byte(newValue))
}
return err
})
All the values I was batch writing got stored and got stored faster. I went from storing 9,104,134 writes in almost 1h43m down to 22m10s; about 5x faster.
@sbowman You're right. It looks like DB.Close()
should obtain the batchMu
& trigger the batch before continuing. If you submit a PR and verify the fix then I can merge it in.
I don't think DB.Close()
should obtain the batchMu
, as this means batch.run()
will wait forever waiting to obtain batchMu
. However, if the batch is triggered from DB.Close()
but without acquiring the mutex, then I believe that there's the possibility that a different goroutine will add a batch transaction (that will never be committed) between the call to db.batch.trigger()
and the moment that the database is effectively closed.
(This is still much better than the current situation where batch transactions are lost if the database is closed at an "unlucky" time, with a probability much higher than the hypothetical situation I described.)
If you have active goroutines calling Update
/View
as you call Close
, you just shot your own foot. Batch
counts as Update
, for that..
Oh and to say something constructive: this is how you wait for your workers to complete: https://blog.golang.org/pipelines
@tv42 Are you karma trolling? If you don't have a patch or something meaningful to contribute, why not leave the technical discussions to the big boys? Have you even tried to replicate this issue (your comments suggest no), or are you just blowing hot air trying to sound erudite?
I have a workaround for this I suppose I should post, but got pulled off onto other projects so haven't had a chance. Will dig up my code tomorrow and see if I can provide some alternatives if anyone is interested.
@sbowman Wow. "Big boys", huh? Guess who wrote Batch
. And have a nice day.
The reason why I don't feel Close
has any obligation to trigger Batch
: it also has no obligation to wait for an Update
that is just about to be called.
If you race calls to Update
/Batch
against Close
, you should either see the update happen, or get ErrDatabaseNotOpen
. That should hold regardless of whether you're calling Update
or Batch
.
Batch
triggering by time some time after the Close
is not distinguishable from /* called just before Close */ time.Sleep(epsilon); db.Batch(...)
.
You do have an obligation to write before db.Close
, because your batch functionality doesn't give me any control over when or if it completes.
Has absolutely nothing to do with me parallelizing my batches through a bunch of goroutines. Did that, waited for my goroutines to finish and WaitGroup
to drain, and your code still left half a batch drifting into the ether.
Let me give you a proper scenario. Feel free to setup a test case and see what happens:
Did anything save?
Since you're the one to blame, if you in fact looked back at your code instead of giving me a link to Concurrency 101, you'll notice you don't execute any of my Batch
calls until either you (a) hit a timeout or (b) reach a batch size limit. Here, let me help (I labeled them for you):
db.batchMu.Lock()
if (db.batch == nil) || (db.batch != nil && len(db.batch.calls) >= db.MaxBatchSize) {
// There is no existing batch, or the existing batch is full; start a new one.
db.batch = &batch{
db: db,
}
db.batch.timer = time.AfterFunc(db.MaxBatchDelay, db.batch.trigger) // <-- (a)
}
db.batch.calls = append(db.batch.calls, call{fn: fn, err: errCh})
if len(db.batch.calls) >= db.MaxBatchSize { // <-- (b)
// wake up batch, it's ready to run
go db.batch.trigger()
}
db.batchMu.Unlock()
How am I supposed to tell the batch to run when I'm done with all my db.Batch
calls? Since I can't tell the batch to run
or trigger
(you made that functionality private), nor do I know whether or not the batch trigger call even completed flushing to disk (since it's run in a goroutine itself), please tell me how it's my obligation to somehow magically know when I can close the database?
Also let me just say, in general if you launch a goroutine deep in your code, it's not my responsibility to make sure it completes. It's yours.
If I recall (don't have my code on this laptop), I tricked your batch code into firing by resetting the db.MaxBatchSize
to 1 then applied some meaningless batch call so the db.batch.trigger
would trip and actually run. Then I called time.Sleep
for five seconds or so and crossed my fingers that everything had time to flush to disk and I could finally, somewhat reliably, call db.Close
.
It seems to me the only reliable solution to this problem is to check the batch size in db.Close
and if necessary run the batch and specifically wait for it to finish before finally closing the database. If you've got another solution, by all means let's hear it.
Just by looking at the API docs and Readme information for db.Batch
, it is not clear at all that batched transactions might never execute - the only warning is that "the function passed to Batch may be called multiple times, regardless of whether it returns error or not" and so the functions must be idempotent.
Nowhere does it say that the functions passed may never execute at all if db.Close
is called at a point where the internal state of bolt (which users of the library should not have to care about, especially if it's not on the GoDoc!) causes batch writes to be lost for one reason or another. In fact, I only found out this problem existed because as I was testing my program, somehow certain things were not being saved even on completely clean exits, and decided to look for issues on GitHub.
There is also no mechanism to trigger batch writes manually (something that could be useful even outside of this db.Close
scenario), so there's not much to be done except play with the internal state of Bolt (which I'm relatively sure is not something people are meant to rely on...) to trigger Batch writes.
After picking through the batch code, I think I finally get what @tv42 intended, but it's not obvious and I'm not sure what db.Batch
buys me anymore.
It looks like he wants us to create a bunch of goroutines to run the batches and then wait for the db.Batch
call to return before returning from the goroutine. Something like this:
var wg sync.WaitGroup
for i := 0; i < 100; i++ {
wg.Add(1)
go func() {
defer wg.Done()
// Do some work in parallel...
// *** This will block until the batch is finally run ***
err := db.Batch(func(tx *bolt.Tx) error {
// Do something to BoltDB in here...
})
if err != nil {
fmt.Printf("Got an error: %s\n", err)
}
}()
}
wg.Wait()
db.Close()
So your process will hang until the batch reaches its max or the timeout is hit and the write is triggered, which means your WaitGroup will pause until all batches are processed.
In my research I did notice the db.Batch
call caches the func
you pass it in an array and runs those through db.Update
one at a time once a "trigger" is hit (timeout or full cache). If one call fails, it's pulled out and the batch continues. The failed call is then run separately at the end (not sure why it would succeed the second time if it didn't the first). So then doesn't just replacing db.Batch
with db.Update
in the above code perform the exact same function (and maybe even faster)?
I assumed two things about batches: I could hand off the batch function and forget about it, and that the batch functions would all run in a single transaction for better write performance. If this isn't the case, is there a benefit to db.Batch
that I'm missing?
In any case, maybe just update the README and Godocs to better explain the existing db.Batch
functionality. The README example right now is about fetching a value, and the Godocs just say something like "run the db.Batch in a goroutine for best performance".
Set your db.MaxBatchDelay to 1 minute.
That's a really long time. It would be most appropriate if your disk took 30 seconds for a single write operation. Don't do that, and you won't have the problem.
The point of Batch
is when you have multiple goroutines that would have called Update
, now their disk syncs can be combined.
I have no idea what you're trying to do here, and your example code has a goroutine without any reason to exist, that is immediately waited on, so it's exactly the same as if there was no extra goroutine used there.
I assumed two things about batches: I could hand off the batch function and forget about it, [...]
It returns an error, why would you want to forget about it! You wouldn't go db.Update(...)
, I hope.
@gbl08ma functions passed to Batch
execute under normal circumstances. @sbowman's problem is caused by his program exiting while it still has work pending. To avoid losing work, don't exit while you have work pending -- it doesn't matter whether that work is calling Update
or Batch
.
I misread the code around running batches as a group under a single transaction; looks like that is what a batch will do for you. Still not really keen on how batch.run
doesn't fail but instead will partially apply then try to run failed batch functions individually. That's not typically what I'd expect from a transaction, but since there are strict documented limitations on what you should put in batch functions, I suppose it's ok. If I want to fail en masse it's not hard to write my own batch processor.
@gbl08ma if you write your batch calls like my example above, your app should hang around until the batch finishes. Depending on your timeout, it could take a few seconds to quit, but it will finish. You may want to capture the kill signals to make sure it does wait if you hit Ctrl-C. Also make sure you don't call os.Exit
anywhere.
@tv42 my examples were meant to be just that: examples. IRL I wouldn't have a one minute timeout, but it's helpful for testing. And I'd have functionality in my goroutines that would run in parallel and eventually make the db.Batch
call. I updated my code sample so that might make more sense. I would still urge that the README and Godocs be updated with a better example. There were enough people on this thread that didn't quite understand the expectations of db.Batch
and how it worked. I suppose you can write them off as just stupid or you can accept that the documentation isn't perhaps as clear as it could be.
For the record, here's how I expected batches to work (hence the confusion):
go db.Batch(...)
go db.Batch(...)
// etc..
// Hangs until batch syncs...
if err := db.Close(); err != nil {
fmt.Printf("Failed to save batch! %s\n", err)
}
go db.Batch(...)
throws away errors, don't do that.
I'm calling "go db.Batch" to handle a large number of writes. I'm seeing errors in my code that the messages queued up in the batch aren't being flushed when I close bolt.DB. I don't see anything in the code calling db.batch.trigger() before closing the database file.
Is this a bug, or is there a way to flush the batch before I close?