Closed bmatsuo closed 8 years ago
So, here is how I interpret this:
Now, considering that N is kind of large (a few 100k probably), allocations from makeBenchDBVal do not factor in. The 4 allocations per op are exclusively from Txn.Put. Not very exciting so far.
But where are those allocations happening in Txn.Put (this is a different run from the one above, so absolute numbers won't match up)?
Total: 906361
ROUTINE ======================== github.com/bmatsuo/lmdb-go/lmdb.(*Txn).Put in /Users/lorenz/go/src/github.com/bmatsuo/lmdb-go/lmdb/txn.go
905940 905940 (flat, cum) 100% of Total
. . 314: vn := len(val)
. . 315: if vn == 0 {
. . 316: val = []byte{0}
. . 317: }
. . 318:
905940 905940 319: ret := C.lmdbgo_mdb_put2(
. . 320: txn._txn, C.MDB_dbi(dbi),
. . 321: unsafe.Pointer(&key[0]), C.size_t(kn),
. . 322: unsafe.Pointer(&val[0]), C.size_t(vn),
. . 323: C.uint(flags),
. . 324: )
The only way I can make the numbers match up is if unsafe.Pointer(&key[0])
does two allocations. Which I don't understand.
Ah. I was partially misreading the code. Indeed makeBenchDBVal will only allocate a fraction of the time. So it will not have an effect on the benchmark's computed allocs/op.
Making those numbers add up is a stretch. Honestly given some of the weird swings in benchmark numbers I have seen before I have wondered if cgo could somehow be allocating or copying things which I don't expect. I never had real evidence for that before this.
edit: this was one of my wacky theories about slowdowns experienced trying to fix #56
Well this is pretty fascinating. Check out the following example: http://play.golang.org/p/hC-b5VSG58
Now look at this benchmark: http://play.golang.org/p/e37ObBxwJP
And the results
$ go test -bench=. -benchmem
testing: warning: no tests to run
PASS
BenchmarkFoo-4 5000000 297 ns/op 48 B/op 2 allocs/op
BenchmarkFoo_nil-4 10000000 148 ns/op 0 B/op 0 allocs/op
ok github.com/bmatsuo/lmdb-go/tmp/cgo-allocs 3.433s
Shocking
Wow. These benchmarks (for the earlier example) tell another interesting story: http://play.golang.org/p/vLQoKQRxPP
$ go test -bench=. -benchmem
testing: warning: no tests to run
PASS
BenchmarkFoo_1-4 5000000 271 ns/op 48 B/op 2 allocs/op
BenchmarkFoo_100-4 2000000 640 ns/op 48 B/op 2 allocs/op
BenchmarkFoo_1000-4 500000 3933 ns/op 48 B/op 2 allocs/op
BenchmarkFoo_10000-4 50000 36555 ns/op 48 B/op 2 allocs/op
BenchmarkFoo_100000-4 5000 362154 ns/op 69 B/op 2 allocs/op
BenchmarkFoo_1000000-4 500 3610440 ns/op 2063 B/op 2 allocs/op
BenchmarkFoo_10000000-4 30 36416743 ns/op 333465 B/op 2 allocs/op
ok github.com/bmatsuo/lmdb-go/tmp/cgo-allocs 12.931s
I guess the cgo pointer checks are doing a linear scan for pointers despite the fact that it is given a []byte
.
edit: Holy hell! I didn't even notice that alloced space increases at the end! What is going on?
That is indeed fascinating! Does DEBUG=cgocheck=0 change that?
Execellent question. The answer... NOT REALLY!! WTF?! (edit: yup, when the correct variable, GODEBUG is used).
GODEBUG=cgocheck=0 go test -bench=. -benchmem
testing: warning: no tests to run
PASS
BenchmarkFoo_1-4 5000000 250 ns/op 48 B/op 2 allocs/op
BenchmarkFoo_100-4 5000000 259 ns/op 48 B/op 2 allocs/op
BenchmarkFoo_1000-4 5000000 256 ns/op 48 B/op 2 allocs/op
BenchmarkFoo_10000-4 5000000 256 ns/op 48 B/op 2 allocs/op
BenchmarkFoo_100000-4 5000000 258 ns/op 48 B/op 2 allocs/op
BenchmarkFoo_1000000-4 5000000 259 ns/op 48 B/op 2 allocs/op
BenchmarkFoo_10000000-4 5000000 254 ns/op 50 B/op 2 allocs/op
ok github.com/bmatsuo/lmdb-go/tmp/cgo-allocs 10.829s
I don't know why the allocated space went down.. It doesn't actually seem correlated with the variable setting.
In fact, the name was wrong, the environment variable is GODEBUG, not DEBUG. Oops.
That makes the linear runtime go away, but the allocations are still there. I'll update my old comment.
See https://github.com/golang/go/blob/master/src/runtime/cgocall.go#L327 ?
// Complicating matters, taking the address of a slice or array
// element permits the C program to access all elements of the slice
// or array. In that case we will see a pointer to a single element,
// but we need to check the entire data structure.
I didn't completely realize this was a dumb runtime check which did not benefit at all from type information about the argument (presumably because it has already been converted to unsafe.Pointer).
I've also noted that this behavior seems to be restricted to ( (edit: dependent on C argument types). If I change the C function signature to void*
) arguments in Cint foo(char *x)
, and convert the unsafe.Pointer to type *C.char
then all allocations disappear and the runtime is ~constant (with or without GODEBUG).
$ go test -bench=. -benchmem
testing: warning: no tests to run
PASS
BenchmarkFoo_1-4 10000000 149 ns/op 0 B/op 0 allocs/op
BenchmarkFoo_100-4 10000000 149 ns/op 0 B/op 0 allocs/op
BenchmarkFoo_1000-4 10000000 149 ns/op 0 B/op 0 allocs/op
BenchmarkFoo_10000-4 10000000 149 ns/op 0 B/op 0 allocs/op
BenchmarkFoo_100000-4 10000000 149 ns/op 0 B/op 0 allocs/op
BenchmarkFoo_1000000-4 10000000 150 ns/op 0 B/op 0 allocs/op
BenchmarkFoo_10000000-4 10000000 149 ns/op 1 B/op 0 allocs/op
ok github.com/bmatsuo/lmdb-go/tmp/cgo-allocs 11.555s
Weirdly enough the very last test says 1 byte is allocated per loop iteration. That is strange indeed. But probably not that important.
Yes, I think you are right. The check tries to be smart and aborts if it knows that the type can not contain pointers, but that plan is foiled by conversion to unsafe.Pointer. See https://github.com/golang/go/blob/master/src/runtime/cgocall.go#L466
Trying to dig up where the type coercion to unsafe.Pointer is happening, no luck :(
Fantastic detective work, @lmb. It makes me quite happy that this should be fixed before go1.7.
After noticing that the allocations were tied to void*
arguments, I had a go and replacing all of them with char*
(which wasn't hard because I already had to proxy all the txn ops in lmdbgo.{c,h}
due to cgo pointer checks and go1.6). The commit is 7d11641 (edit: force pushed to 863bf7d because I mistyped the issue number in the commit message).
The results look good. Though I'm going to look into that Go issue further, and probably try things out on tip/master before doing anything with that commit.
Here are the tentative numbers though. I always love seeing "-100%" :smile:
benchmark old ns/op new ns/op delta
BenchmarkEnv_ReaderList-4 51303 56360 +9.86%
BenchmarkTxn_Put-4 1773 1496 -15.62%
BenchmarkTxn_PutReserve-4 1701 1556 -8.52%
BenchmarkTxn_PutReserve_writemap-4 1420 1294 -8.87%
BenchmarkTxn_Put_writemap-4 1494 1234 -17.40%
BenchmarkTxn_Get_ro-4 1480 1331 -10.07%
BenchmarkTxn_Get_raw_ro-4 854 729 -14.64%
BenchmarkScan_ro-4 5595657 5582831 -0.23%
BenchmarkScan_raw_ro-4 1195792 1204928 +0.76%
BenchmarkCursor-4 692 700 +1.16%
BenchmarkCursor_Renew-4 177 178 +0.56%
BenchmarkTxn_Sub_commit-4 188845 188734 -0.06%
BenchmarkTxn_Sub_abort-4 193656 194200 +0.28%
BenchmarkTxn_abort-4 14458 14443 -0.10%
BenchmarkTxn_commit-4 14307 14229 -0.55%
BenchmarkTxn_ro-4 144698 148477 +2.61%
BenchmarkTxn_unmanaged_abort-4 14347 14632 +1.99%
BenchmarkTxn_unmanaged_commit-4 14625 14477 -1.01%
BenchmarkTxn_unmanaged_ro-4 145135 154876 +6.71%
BenchmarkTxn_renew-4 531 531 +0.00%
BenchmarkTxn_Put_append-4 682 396 -41.94%
BenchmarkTxn_Put_append_noflag-4 839 575 -31.47%
benchmark old allocs new allocs delta
BenchmarkEnv_ReaderList-4 113 113 +0.00%
BenchmarkTxn_Put-4 4 0 -100.00%
BenchmarkTxn_PutReserve-4 2 0 -100.00%
BenchmarkTxn_PutReserve_writemap-4 2 0 -100.00%
BenchmarkTxn_Put_writemap-4 4 0 -100.00%
BenchmarkTxn_Get_ro-4 3 1 -66.67%
BenchmarkTxn_Get_raw_ro-4 3 1 -66.67%
BenchmarkScan_ro-4 8151 8163 +0.15%
BenchmarkScan_raw_ro-4 5 5 +0.00%
BenchmarkCursor-4 3 3 +0.00%
BenchmarkCursor_Renew-4 0 0 +0.00%
BenchmarkTxn_Sub_commit-4 3 3 +0.00%
BenchmarkTxn_Sub_abort-4 3 3 +0.00%
BenchmarkTxn_abort-4 5 5 +0.00%
BenchmarkTxn_commit-4 5 5 +0.00%
BenchmarkTxn_ro-4 5 5 +0.00%
BenchmarkTxn_unmanaged_abort-4 5 5 +0.00%
BenchmarkTxn_unmanaged_commit-4 5 5 +0.00%
BenchmarkTxn_unmanaged_ro-4 5 5 +0.00%
BenchmarkTxn_renew-4 0 0 +0.00%
BenchmarkTxn_Put_append-4 5 1 -80.00%
BenchmarkTxn_Put_append_noflag-4 5 1 -80.00%
benchmark old bytes new bytes delta
BenchmarkEnv_ReaderList-4 7408 7408 +0.00%
BenchmarkTxn_Put-4 96 0 -100.00%
BenchmarkTxn_PutReserve-4 48 0 -100.00%
BenchmarkTxn_PutReserve_writemap-4 48 0 -100.00%
BenchmarkTxn_Put_writemap-4 96 0 -100.00%
BenchmarkTxn_Get_ro-4 1148 1099 -4.27%
BenchmarkTxn_Get_raw_ro-4 66 18 -72.73%
BenchmarkScan_ro-4 9127785 9130894 +0.03%
BenchmarkScan_raw_ro-4 72 72 +0.00%
BenchmarkCursor-4 33 33 +0.00%
BenchmarkCursor_Renew-4 0 0 +0.00%
BenchmarkTxn_Sub_commit-4 65 65 +0.00%
BenchmarkTxn_Sub_abort-4 65 65 +0.00%
BenchmarkTxn_abort-4 97 97 +0.00%
BenchmarkTxn_commit-4 97 97 +0.00%
BenchmarkTxn_ro-4 97 97 +0.00%
BenchmarkTxn_unmanaged_abort-4 97 97 +0.00%
BenchmarkTxn_unmanaged_commit-4 97 97 +0.00%
BenchmarkTxn_unmanaged_ro-4 97 97 +0.00%
BenchmarkTxn_renew-4 0 0 +0.00%
BenchmarkTxn_Put_append-4 104 8 -92.31%
BenchmarkTxn_Put_append_noflag-4 104 8 -92.31%
After testing on tip it seems that the allocations remain present with void*
arguments (though, indeed runtime is constant on that example benchmark).
But the allocs mean I may end up merging that char*
change. I'll keep investigating a bit longer. I might bring the allocations up on one of the mailing lists.
Nice :+1: Maybe this will make the benchmarks less volatile as well.
I wouldn't get my hopes too high about that :smile:
Even in those numbers I pasted there is some weirdness (e.g. ReaderList +10%?)
The interaction with the system is so high. My suspicion is that you need a real clean-room style environment to run the benchmarks if you want truly consistent times. Something like busybox with minimal external processes, running everything in memory, and where any kind of virtualization has been disabled. IDK though.
I'm tempted to make this part of the next release as well. But it will wait until I can get some more insight about whether those allocations would feasibly be eliminated somehow in the near future.
Yea. I think those allocations are going to be around for a while. I made an issue about it but it doesn't seem to be a high priority.
https://github.com/golang/go/issues/15048
I'm preparing a pull request.
@lmb pointed out some anomalous looking numbers reported by
go test -bench=. -benchmem
in #61, Specifically regarding the BenchmarkTxn_Put function.The 4 allocations reported are difficult to track down.
One should be happening in the benchmark function itself(edit: that is false, allocations in the benchmark function get amortized and do not have a significant effect). But the others are not easy to account for. In a normal (non-error) case Txn.Put does not appear to make allocations inside the Go runtime.@lmb ran the pprof tool using
-alloc_objects
and the following information about allocations in the benchmark function was comptued:Nothing seems to be adding up correctly here.