dgraph-io / badger

Fast key-value DB in Go.
https://dgraph.io/badger
Apache License 2.0
13.89k stars 1.18k forks source link

Update Transaction behavior seems wrong #491

Closed delaneyj closed 6 years ago

delaneyj commented 6 years ago

I'm pretty sure I must be using the txn.Update incorrectly. Basically I have a global best key that multiple go routines could possibly be updating. Have each in a loop if there is a conflict trying to update that record if its state it better. Weird thing is that global value bounces around...

2018/05/16 15:07:31 New global best found 1.511519->1.102839 from 22f4ef65-d573-404c-8edd-afe52c82a15c
2018/05/16 15:07:31 New global best found 1.591452->1.011743 from 80da977e-4e95-43eb-900a-3da0591c05ce
2018/05/16 15:07:31 New global best found 1.591452->1.049474 from d33979fb-d958-46b2-a4c7-a39add02c7d4
2018/05/16 15:07:31 New global best found 0.521211->0.497072 from 348b8814-d502-4b6f-85b1-c13fe867e2e3
2018/05/16 15:07:31 New global best found 1.693965->0.815432 from 0571dce9-df18-446c-8fe0-f94cb020ae44
2018/05/16 15:07:31 New global best found 1.591452->0.754463 from 6db404e8-6c6c-4db5-a392-5e474b969413
2018/05/16 15:07:31 New global best found 0.497072->0.491108 from 23baf82d-574e-4473-aa72-b639bb253607
2018/05/16 15:07:31 New global best found 0.491108->0.375488 from 55b8211c-5b1e-4a9e-8104-9ccdb7fc525b
2018/05/16 15:07:31 New global best found 1.591452->0.772561 from fb6fe824-f30d-4883-9c87-4cc2ebc41d62
2018/05/16 15:07:31 New global best found 0.375488->0.250006 from 55b8211c-5b1e-4a9e-8104-9ccdb7fc525b
2018/05/16 15:07:31 New global best found 0.250006->0.240968 from 377bda88-1cac-4d52-8b6f-46ece3db993a
2018/05/16 15:07:33 New global best found 0.240968->0.091518 from 9e60f05d-6602-4db2-b46f-399f79e02a61
2018/05/16 15:07:36 New global best found 0.091518->0.067371 from 9e60f05d-6602-4db2-b46f-399f79e02a61
2018/05/16 15:07:36 New global best found 0.067371->0.064957 from d33979fb-d958-46b2-a4c7-a39add02c7d4

So I made an MVP showing the problem.

package main

import (
    "encoding/binary"
    "log"
    "math"
    "math/rand"
    "os"
    "sync"
    "time"

    "github.com/dgraph-io/badger"
    "github.com/pkg/errors"
)

var (
    bestKey = []byte("best")
)

func main() {
    folderName := "badger"

    err := os.RemoveAll(folderName)
    checkErr(err)

    opts := badger.DefaultOptions
    opts.Dir = folderName
    opts.ValueDir = folderName
    db, err := badger.Open(opts)
    checkErr(err)
    defer db.Close()

    wg := &sync.WaitGroup{}
    goRoutinesCount := 100

    log.Print("sequentially works fine")
    setMax(db)
    wg.Add(goRoutinesCount)
    for i := 0; i < goRoutinesCount; i++ {
        doStuff(i, db, 5, wg)
    }
    wg.Wait()
    log.Print("done sequentially")

    time.Sleep(time.Second)
    log.Print("in go routines it fails")
    setMax(db)
    wg.Add(goRoutinesCount)
    for i := 0; i < goRoutinesCount; i++ {
        go doStuff(i, db, 5, wg)
    }
    wg.Wait()
    log.Print("done go routines")
}

func setMax(db *badger.DB) {
    err := db.Update(func(txn *badger.Txn) error {
        err := txn.Set(bestKey, f2b(math.MaxFloat64))
        if err != nil {
            return errors.Wrap(err, "can't set initial value")
        }
        return nil
    })
    checkErr(err)
}

func doStuff(id int, db *badger.DB, iterations int, wg *sync.WaitGroup) {
    for i := 0; i < iterations; i++ {
        newBestFound := false
        var from, to float64

        err := badger.ErrConflict
        for err == badger.ErrConflict {
            myBest := rand.Float64()

            err = db.Update(func(txn *badger.Txn) error {
                item, err := txn.Get(bestKey)
                if err != nil {
                    return errors.Wrap(err, "can't get best")
                }
                bestBytes, err := item.Value()
                if err != nil {
                    return errors.Wrap(err, "can't get value of best")
                }

                best := b2f(bestBytes)

                if myBest < best {
                    updatedBestBytes := f2b(myBest)
                    err := txn.Set(bestKey, updatedBestBytes)
                    if err != nil {
                        return errors.Wrap(err, "can't set new best")
                    }

                    newBestFound = true
                    from = best
                    to = myBest
                }

                return nil
            })
        }

        if newBestFound {
            log.Printf("ID '%d' found new best %f->%f at i:%d", id, from, to, i)
        }
    }

    wg.Done()
}

func b2f(bytes []byte) float64 {
    bits := binary.LittleEndian.Uint64(bytes)
    float := math.Float64frombits(bits)
    return float
}

func f2b(float float64) []byte {
    bits := math.Float64bits(float)
    bytes := make([]byte, 8)
    binary.LittleEndian.PutUint64(bytes, bits)
    return bytes
}

func checkErr(err error) {
    if err != nil {
        log.Fatal(err)
    }
}

Note the sequential version is fine. But the same routines don't work in goroutines.

2018/05/16 18:02:33 sequentially works fine
2018/05/16 18:02:33 ID '0' found new best 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000->0.246948 at i:0
2018/05/16 18:02:33 ID '0' found new best 0.246948->0.117908 at i:3
2018/05/16 18:02:33 ID '4' found new best 0.117908->0.029366 at i:4
2018/05/16 18:02:33 ID '6' found new best 0.029366->0.016174 at i:1
2018/05/16 18:02:33 ID '8' found new best 0.016174->0.013359 at i:2
2018/05/16 18:02:33 ID '33' found new best 0.013359->0.007824 at i:4
2018/05/16 18:02:33 ID '46' found new best 0.007824->0.007060 at i:2
2018/05/16 18:02:33 ID '51' found new best 0.007060->0.006900 at i:0
2018/05/16 18:02:33 ID '97' found new best 0.006900->0.006315 at i:2
2018/05/16 18:02:33 done sequentially
2018/05/16 18:02:34 in go routines it fails
2018/05/16 18:02:34 ID '1' found new best 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000->0.309655 at i:0
2018/05/16 18:02:34 ID '11' found new best 0.374678->0.148501 at i:0
2018/05/16 18:02:34 ID '11' found new best 0.374678->0.002869 at i:3
2018/05/16 18:02:34 ID '99' found new best 0.374678->0.008878 at i:0
2018/05/16 18:02:34 ID '31' found new best 0.374678->0.273268 at i:2
2018/05/16 18:02:34 ID '0' found new best 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000->0.374678 at i:0
2018/05/16 18:02:34 ID '6' found new best 0.165741->0.104019 at i:2
2018/05/16 18:02:34 ID '17' found new best 0.165741->0.042233 at i:0
2018/05/16 18:02:34 ID '17' found new best 0.165741->0.003022 at i:1
2018/05/16 18:02:34 ID '8' found new best 0.374678->0.249382 at i:1
2018/05/16 18:02:34 ID '99' found new best 0.374678->0.215468 at i:4
2018/05/16 18:02:34 ID '2' found new best 0.374678->0.245826 at i:1
2018/05/16 18:02:34 ID '32' found new best 0.165741->0.096134 at i:0
2018/05/16 18:02:34 ID '62' found new best 0.165741->0.044048 at i:2
2018/05/16 18:02:34 ID '3' found new best 0.374678->0.160229 at i:3
2018/05/16 18:02:34 ID '30' found new best 0.374678->0.351348 at i:0
2018/05/16 18:02:34 ID '46' found new best 0.165741->0.001548 at i:0
2018/05/16 18:02:34 ID '9' found new best 0.044048->0.015117 at i:0
2018/05/16 18:02:34 ID '30' found new best 0.044048->0.031568 at i:1
2018/05/16 18:02:34 ID '64' found new best 0.044048->0.024159 at i:3
2018/05/16 18:02:34 ID '8' found new best 0.165741->0.018651 at i:2
2018/05/16 18:02:34 ID '7' found new best 0.374678->0.209922 at i:3
2018/05/16 18:02:34 ID '33' found new best 0.165741->0.023037 at i:0
2018/05/16 18:02:34 ID '71' found new best 0.165741->0.080804 at i:0
2018/05/16 18:02:34 ID '35' found new best 0.165741->0.144341 at i:0
2018/05/16 18:02:34 ID '75' found new best 0.044048->0.031357 at i:3
2018/05/16 18:02:34 ID '37' found new best 0.044048->0.017953 at i:0
2018/05/16 18:02:34 ID '92' found new best 0.044048->0.038871 at i:1
2018/05/16 18:02:34 ID '45' found new best 0.044048->0.013406 at i:1
2018/05/16 18:02:34 ID '41' found new best 0.165741->0.106794 at i:0
2018/05/16 18:02:34 ID '98' found new best 0.044048->0.032327 at i:0
2018/05/16 18:02:34 ID '4' found new best 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000->0.591764 at i:0
2018/05/16 18:02:34 ID '1' found new best 0.044048->0.014598 at i:2
2018/05/16 18:02:34 ID '67' found new best 0.044048->0.037900 at i:1
2018/05/16 18:02:34 ID '61' found new best 0.374678->0.055329 at i:0
2018/05/16 18:02:34 ID '18' found new best 0.044048->0.014088 at i:4
2018/05/16 18:02:34 ID '67' found new best 0.044048->0.015395 at i:4
2018/05/16 18:02:34 ID '61' found new best 0.044048->0.003442 at i:2
2018/05/16 18:02:34 ID '39' found new best 0.044048->0.005969 at i:2
2018/05/16 18:02:34 ID '43' found new best 0.044048->0.010684 at i:2
2018/05/16 18:02:34 ID '73' found new best 0.165741->0.138973 at i:0
2018/05/16 18:02:34 ID '43' found new best 0.044048->0.028360 at i:4
2018/05/16 18:02:34 ID '52' found new best 0.165741->0.102570 at i:0
2018/05/16 18:02:34 ID '70' found new best 0.044048->0.020834 at i:1
2018/05/16 18:02:34 ID '56' found new best 0.044048->0.035076 at i:1
2018/05/16 18:02:34 ID '62' found new best 0.044048->0.007930 at i:4
2018/05/16 18:02:34 ID '70' found new best 0.044048->0.001733 at i:3
2018/05/16 18:02:34 ID '86' found new best 0.044048->0.024323 at i:1
2018/05/16 18:02:34 ID '49' found new best 0.165741->0.046047 at i:0
2018/05/16 18:02:34 ID '28' found new best 0.165741->0.077393 at i:0
2018/05/16 18:02:34 ID '28' found new best 0.007930->0.007873 at i:1
2018/05/16 18:02:34 ID '57' found new best 0.374678->0.002167 at i:0
2018/05/16 18:02:34 ID '86' found new best 0.007930->0.002073 at i:4
2018/05/16 18:02:34 ID '25' found new best 0.165741->0.039724 at i:0
2018/05/16 18:02:34 ID '58' found new best 0.374678->0.372617 at i:0
2018/05/16 18:02:34 done go routines

I thought if the bestKey got updated the other transactions would invalidate and start over. What am I missing?

manishrjain commented 6 years ago

The code is a bit complicated to follow. But, general sense is that your newBestFound could be true, even if you didn't commit anything. If your had a ErrConflict in the txn which actually did the Set, and the next one just did a read and exited, it would end up printing your new best found if condition.

I doubt there's a bug in Badger, we have extensive tests for that. You should look deeper at your code.

delaneyj commented 6 years ago

newBestFound could be true, even if you didn't commit anything.

Not sure I follow. The newBestFound is only set to true after a txn.Set() call. If the set failed with an ErrConflict that flag will never be true.

manishrjain commented 6 years ago

The ErrConflict would be returned by txn.Update, not txn.Set. Update is the one which would try to commit the transaction, and run conflict detection. Set would just register what you intend to set.

delaneyj commented 6 years ago

Ah, that makes more sense. I don't think its in the README that the ErrConflict only happens at the txn.Commit(). I thought it would error out early. I can make this work now, thanks.