boltdb / bolt

An embedded key/value database for Go.
MIT License
14.2k stars 1.51k forks source link

iterate returns duplicate keys #632

Open xiang90 opened 7 years ago

xiang90 commented 7 years ago

I expect iterating a bucket with cursor return keys in order, but it would return duplicated keys.

Here is the data

And here is the script to reproduce the problem

Here is the output:

key="\x00\x00\x00\x00\x00\x00\xfe\x9d_\x00\x00\x00\x00\x00\x00\x00\x00"
key="\x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00"
key="\x00\x00\x00\x00\x00\x00\xfe\x9b_\x00\x00\x00\x00\x00\x00\x00\x00"
key="\x00\x00\x00\x00\x00\x00\xfe\x9a_\x00\x00\x00\x00\x00\x00\x00\x00"
duplicated key="\x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00"!!!!
duplicated key="\x00\x00\x00\x00\x00\x00\xfe\x9b_\x00\x00\x00\x00\x00\x00\x00\x00"!!!!
duplicated key="\x00\x00\x00\x00\x00\x00\xfe\x9a_\x00\x00\x00\x00\x00\x00\x00\x00"!!!!

Am I missing something?

The key \x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00 shows up twice during cursor iterating.

xiang90 commented 7 years ago

I looked into it more. I think actually the db itself is corrupted during page split? (or we failed to reload the nodes correctly)

There are 3 duplicated keys in two adjacent pages:

load index 2 from page 1799
"\x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 1 from page 1799
"\x00\x00\x00\x00\x00\x00\xfe\x9b_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 0 from page 1799
"\x00\x00\x00\x00\x00\x00\xfe\x9a_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 32 from page 1798
"\x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 31 from page 1798
"\x00\x00\x00\x00\x00\x00\xfe\x9b_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 30 from page 1798
"\x00\x00\x00\x00\x00\x00\xfe\x9a_\x00\x00\x00\x00\x00\x00\x00\x00"
load index 29 from page 1798
benbjohnson commented 7 years ago

@xiang90 Are you retaining and altering the []byte that you pass as a key or value to Bucket.Put()?

xiang90 commented 7 years ago

@benbjohnson No. Each key is put only once without any buffering, so no future access. But I will double check.

xiang90 commented 7 years ago

@benbjohnson I suspect the split code is the cause

    // Split inodes across two nodes.
    next.inodes = n.inodes[splitIndex:]
    n.inodes = n.inodes[:splitIndex]

We do not copy the backing array. So if we append one element to n.inodes, it will overwrite the first element of next.inodes.

See my example here:

https://play.golang.org/p/xjI9r6rpow

But this is just my guess by reading limited boltdb code. I am not sure.

xiang90 commented 7 years ago

On the second thought, it might not be the exact cause.

We put keys sequentially. We do not miss any keys, but there are 3 duplicates in between. Basically it looks like:

<------- |dup|-------> 1 2 3 4 5 3 4 5 6 7 8 9

benbjohnson commented 7 years ago

@xiang90 Do you have an example program that writes the data?

xiang90 commented 7 years ago

@benbjohnson

package playground

import (
    "sync"

    "github.com/boltdb/bolt"
)

type store struct {
    sync.Mutex
    revision int
    tx       *bolt.Tx
    db       *bolt.DB
}

func (s *store) put(value []byte) {
    s.Lock()
    defer s.Unlock()

    bucket := tx.Bucket("keys")
    if bucket == nil {
        panic("bucket key does not exist")
    }

    // it is useful to increase fill percent when the workload is seq append.
    // this can delay the page split and reduce space usage.
    bucket.FillPercent = 0.9

    if err := bucket.Put(intToBytes(s.revision), value); err != nil {
        panic(err)
    }
    s.revision++
}

func (s *store) get(rev int) []byte {
    s.Lock()
    defer s.Unlock()

    bucket := tx.Bucket("keys")
    if bucket == nil {
        panic("bucket key does not exist")
    }

    b := bucket.Get(intToBytes(rev))
    nb := make([]byte, len(b))
    copy(nb, b)
    return nb
}

// called every 100ms, every 10000 puts
func (s *store) commit() {
    s.Lock()
    defer s.Unlock()

    if err := s.tx.Commit(); err != nil {
        panic(err)
    }

    var err error
    s.tx, err = s.db.Begin(true)
    if err != nil {
        panic(err)
    }
}

It looks pretty much like this.

xiang90 commented 7 years ago

The put bytes are generated by marshaling a protobuf request, so it will never be accessed again.

The get part is not exactly accurate. we actually unmarshal the data into a protobuf response holding the lock. Protobuf unmarshalling should copy the data.

jsvisa commented 6 years ago

@xiang90 we occured the same issue, how could you fixed it?