derekparker / trie

Data structure and relevant algorithms for extremely fast prefix/fuzzy string searching.
MIT License
761 stars 115 forks source link

defer impact on trie.Add() #21

Closed jsign closed 5 years ago

jsign commented 5 years ago

I want to share a similar exploration I've made in other project regarding the impact on defer.

I avoid using defer in the Add method with some interesting results. Considering there wasn't a benchmark for Add, I created one. Using this new benchmark, I tested the actual implementation and this new one.

The end results are:

benchmark               old ns/op     new ns/op     delta
BenchmarkAdd/128B-8     55666075      49689240      -10.74%
BenchmarkAdd/256B-8     54631261      50188483      -8.13%
BenchmarkAdd/512B-8     57324361      50005511      -12.77%
BenchmarkAdd/1K-8       55605380      51304633      -7.73%
BenchmarkAdd/2K-8       55761200      50059447      -10.23%
BenchmarkAdd/4K-8       56268353      50719990      -9.86%
BenchmarkAdd/8K-8       55517450      50338050      -9.33%
BenchmarkAdd/16K-8      57047413      50159699      -12.07%
BenchmarkAdd/32K-8      56182766      50119649      -10.79%

benchmark               old allocs     new allocs     delta
BenchmarkAdd/128B-8     306915         306915         +0.00%
BenchmarkAdd/256B-8     306915         306915         +0.00%
BenchmarkAdd/512B-8     306915         306915         +0.00%
BenchmarkAdd/1K-8       306915         306915         +0.00%
BenchmarkAdd/2K-8       306915         306915         +0.00%
BenchmarkAdd/4K-8       306915         306915         +0.00%
BenchmarkAdd/8K-8       306915         306915         +0.00%
BenchmarkAdd/16K-8      306915         306915         +0.00%
BenchmarkAdd/32K-8      306915         306915         +0.00%

benchmark               old bytes     new bytes     delta
BenchmarkAdd/128B-8     14731932      14731929      -0.00%
BenchmarkAdd/256B-8     14731921      14731923      +0.00%
BenchmarkAdd/512B-8     14731922      14731920      -0.00%
BenchmarkAdd/1K-8       14731920      14731920      +0.00%
BenchmarkAdd/2K-8       14731921      14731920      -0.00%
BenchmarkAdd/4K-8       14731921      14731920      -0.00%
BenchmarkAdd/8K-8       14731921      14731920      -0.00%
BenchmarkAdd/16K-8      14731921      14731920      -0.00%
BenchmarkAdd/32K-8      14731920      14731920      +0.00%

This means that not using defer seems to have a considerable impact in ns/op of Add.

What are those sub-benchs? Just applying the same idea of bitcask to test if the meta size has some influence on the results. Seeing the implementation, it shouldn't and the results seem to reflect that. (so we can ignore the sub-benchmarks if you like, I kept them just in case).

How to reproduce these results? (not completely elegant, but enough to show the experiment)

git clone --branch master_avoiddefer https://github.com/jsign/trie.git
rm *.txt ; go test -benchmem -benchtime=5s -bench="BenchmarkAdd\b" > before.txt && go test -benchmem -benchtime=5s -bench=BenchmarkAddWithoutDefer > after.txt && sed -i 's/WithoutDefer//g' after.txt && benchcmp before.txt after.txt

Small explanation:

This idea may apply to other parts where defer sync.Mutex.Unlock() exists.

In the case of bitcask, the impact was close to 4%, here seems to be greater. If I'm not missing something, looks like the Add function is quite fast and defer overhead is relatively big (relatively).

derekparker commented 5 years ago

This seems reasonable. Add only has a single return so the value of using a defer is lost anyways.

jsign commented 5 years ago

If you like I could make a PR applying this idea when possible, and see there the bench results and decide.

derekparker commented 5 years ago

@jsign that sounds good to me. Thanks!