boltdb / bolt

An embedded key/value database for Go.
MIT License
14.23k stars 1.51k forks source link

Documentation request: What is a sequential write? #338

Open bmhatfield opened 9 years ago

bmhatfield commented 9 years ago

Hi,

There are various references to sequential writes in the documentation, as well as in commentary here: https://sacrilege.io/benchmarking-bolt/. Unfortunately, and I may be the only one, it is not clear to me what precisely is meant by "sequential" write.

Is it just repeated writes to the same bucket? Must the writes follow some sequence by key, such as alphabetical, or numerically incrementing? Is there some other form of writing that would be considered "sequential"?

Thank you! Brian

alexsacr commented 9 years ago

Hi Brian,

I wrote the benchmark you're referencing. You're certainly not the only one, since I made a mistake as well. What I describe as sequential is not actually sequential. I've taken down the post until I correct things.

To quote the docs:

Bolt stores its keys in byte-sorted order within a bucket.

To my understanding, this means that to achieve sequential writes you'll want to look at the byte output of the keys you're inserting.

alexsacr commented 9 years ago

To add more color:

At first glance, you'd expect a slice of keys like this to result in sequential writes:

s := []string{"9", "10", "11", "1000", "1001"}

Turns out, the byte representation of this is:

[57]
[49 48]
[49 49]
[49 48 48 48]
[49 48 48 49]

Not sequential. These are byte-encoding strings, when we're looking for byte-encoded integers.

Using integers instead:

i := []uint32{9, 10, 11, 1000, 1001}

Results in this byte output:

[9 0 0 0]
[10 0 0 0]
[11 0 0 0]
[232 3 0 0]
[233 3 0 0]

And the sequential behavior we want.

Playground link.

bmhatfield commented 9 years ago

Thank you for your notes, this is very helpful! In other discussions of Bolt, I had observed that the authors claim that sequential inserts see a large performance increase. Are you now seeing that behavior with our updated understanding of 'sequential'?

alexsacr commented 9 years ago

No problem, atonement is called for :+1: . I haven't had a chance to check yet, but I'd expect to see an increase.

benbjohnson commented 9 years ago

@bmhatfield That's a great idea for a section in the Bolt docs. I'll add one to make it more clear. It's really helpful to get documentation feedback so thanks for posting an issue.

Alex is correct that sequential reads are based on byte sorting. One small correction is that you'll want to use binary.BigEndian to encode instead of little endian so that this array:

i := []uint32{9, 10, 11, 1000, 1001}

gets converted to:

[0 0 0 9]
[0 0 0 10]
[0 0 0 11]
[0 0 3 232]
[0 0 3 233]

Updated playground link

Bolt uses bytes.Compare() so it compares the first byte, then second, third, etc. I typically use convenience functions with working with int keys like this:

// u64tob converts a uint64 into an 8-byte slice.
func u64tob(v uint64) []byte {
    b := make([]byte, 8)
    binary.BigEndian.PutUint64(b, v)
    return b
}

// btou64 converts an 8-byte slice into an uint64.
func btou64(b []byte) uint64 { return binary.BigEndian.Uint64(b) }
benbjohnson commented 9 years ago

There's also a primitive synthetic benchmark program built into the bolt command. You can do sequential or random writes with different batch sizes. Here are some example runs from my mid 2012 Macbook Pro:

Sequential Writes (1M inserts, 1000 per batch)

$ bolt bench --write-mode seq --count 1000000 --batch-size 1000
# Write 2.651541563s    (2.651µs/op)   (377216 op/sec)
# Read  1.020261125s    (39ns/op)   (25641025 op/sec)

Random Writes (1M inserts, 1000 per batch)

$ bolt bench --write-mode rnd --count 1000000 --batch-size 1000
# Write 55.486853265s   (55.486µs/op)  (18022 op/sec)
# Read  1.015875239s    (36ns/op)   (27777777 op/sec)

Random Writes (100K inserts, 1 per batch)

$ bolt bench --write-mode rnd --count 100000 --batch-size 1
# Write 49.664273612s   (496.642µs/op) (2013 op/sec)
# Read  1.001432863s    (37ns/op)   (27027027 op/sec)

The write performance depends heavily on how you write and batch. You can go from 377K w/sec to 2K w/sec depending on how you're using Bolt.

Range scan read performance doesn't fluctuate very much (although it depends on the OS page cache and your Bucket's fill percent). You can typically see around 20M+ reads/sec when iterating over a cursor. Individual seeks are more expensive though.

xiang90 commented 9 years ago

@benbjohnson I have a related question about batching. I can put a WAL in front of blot to mitigate the append only B-tree write issue by reordering and batching.

But it seems that the current DB.batch() is not designed for this use case, which still blocks the writer. Also is it possible or suggested to read out the result before I actually commit the batched txn to bolt?

benbjohnson commented 9 years ago

@xiang90 DB.Batch() is only meant to batch. It can't do reordering since the order of operations in a batch may still matter to the application.

I'm not sure I understand your question about reading out the result before committing. What are you trying to do?

xiang90 commented 9 years ago

It can't do reordering since the order of operations in a batch may still matter to the application.

It is also designed for multiple writers case. For the WAL thing, probably I will only have one writer. DB.Batch() will block my single writer.

What are you trying to do?

For example, I have one WAL writer and multiple readers. My application writes A=1 to WAL and WAL writer writes A=1 into a bolt tnx and the tnx will not be committed due to batch. But I want a reader to be able to read A=1 out of the tnx immediately, although it is not committed yet.

I roughly looked at the code, and think it is possible. But do you suggest to do this?

benbjohnson commented 9 years ago

Sure, you can throw a WAL in front of Bolt but the biggest issue is that you'll add a lot of complexity. Retrieving values isn't too hard -- just check the WAL-committed value before checking the Bolt-committed value. You'll need to add tombstones for deletions though. Doing range scans becomes more complex since you'll need to iterate over the WAL-committed values and the Bolt-committed values together and merge them into a single logical cursor for the application.

Transactions are also tough to implement -- especially if you want a non-blocking point-in-time representation of your data. If your application doesn't need to rollback transactions then that makes it all a little easier.

One thing I haven't tried but seems like it should work well is doing a WAL in a Bolt bucket and batching writes to that. Since the bucket is append only then it should be really fast. Then you can periodically apply the WAL to the rest of your Bolt DB in a single Tx and truncate the WAL.