ztstewart commented 5 years ago

UUIDs are implemented as a fixed length array, giving great performance.

However, while profiling, I've found a few cases in the parsing code that can lead to heap allocations:

Passing values to fmt.Errorf, which accepts interface{}, which typically requires that parameters be boxed
Converting []byte to string or vice versa

We can avoid these by doing the following:

Copy values passed to fmt.* and friends, which will speed up the happy path
Using some unsafe trickery for []byte <--> string conversion

This PR implements both of these strategies. As a result, various parsing methods are faster.

Old benchmark results:

 % go test -benchmem -bench ./...
goos: linux
goarch: amd64
pkg: github.com/gofrs/uuid
BenchmarkFromString/canonical-8                 10000000               123 ns/op              48 B/op          1 allocs/op
BenchmarkFromString/urn-8                       10000000               133 ns/op              48 B/op          1 allocs/op
BenchmarkFromString/braced-8                    10000000               130 ns/op              48 B/op          1 allocs/op
BenchmarkGenerator/NewV1-8                       1000000              1092 ns/op               0 B/op          0 allocs/op
BenchmarkGenerator/NewV2-8                       1000000              1095 ns/op               0 B/op          0 allocs/op
BenchmarkGenerator/NewV3-8                       5000000               273 ns/op             144 B/op          4 allocs/op
BenchmarkGenerator/NewV4-8                       1000000              1488 ns/op              16 B/op          1 allocs/op
BenchmarkGenerator/NewV5-8                       5000000               317 ns/op             176 B/op          4 allocs/op
PASS
ok      github.com/gofrs/uuid   11.589s

New benchmark results:

 % go test -benchmem -bench ./...
goos: linux
goarch: amd64
pkg: github.com/gofrs/uuid
BenchmarkFromString/canonical-8                 20000000                87.9 ns/op             0 B/op          0 allocs/op
BenchmarkFromString/urn-8                       20000000                97.6 ns/op             0 B/op          0 allocs/op
BenchmarkFromString/braced-8                    20000000                92.2 ns/op             0 B/op          0 allocs/op
BenchmarkGenerator/NewV1-8                       1000000              1110 ns/op               0 B/op          0 allocs/op
BenchmarkGenerator/NewV2-8                       1000000              1100 ns/op               0 B/op          0 allocs/op
BenchmarkGenerator/NewV3-8                       5000000               275 ns/op             144 B/op          4 allocs/op
BenchmarkGenerator/NewV4-8                       1000000              1485 ns/op              16 B/op          1 allocs/op
BenchmarkGenerator/NewV5-8                       5000000               322 ns/op             176 B/op          4 allocs/op
PASS
ok      github.com/gofrs/uuid   13.213s

Diff:

% benchcmp old.txt new.txt
benchmark                           old ns/op     new ns/op     delta
BenchmarkFromString/canonical-8     128           88.6          -30.78%
BenchmarkFromString/urn-8           134           101           -24.63%
BenchmarkFromString/braced-8        131           94.9          -27.56%
BenchmarkGenerator/NewV1-8          1072          1096          +2.24%
BenchmarkGenerator/NewV2-8          1078          1111          +3.06%
BenchmarkGenerator/NewV3-8          271           274           +1.11%
BenchmarkGenerator/NewV4-8          1471          1506          +2.38%
BenchmarkGenerator/NewV5-8          318           317           -0.31%

benchmark                           old allocs     new allocs     delta
BenchmarkFromString/canonical-8     1              0              -100.00%
BenchmarkFromString/urn-8           1              0              -100.00%
BenchmarkFromString/braced-8        1              0              -100.00%
BenchmarkGenerator/NewV1-8          0              0              +0.00%
BenchmarkGenerator/NewV2-8          0              0              +0.00%
BenchmarkGenerator/NewV3-8          4              4              +0.00%
BenchmarkGenerator/NewV4-8          1              1              +0.00%
BenchmarkGenerator/NewV5-8          4              4              +0.00%

benchmark                           old bytes     new bytes     delta
BenchmarkFromString/canonical-8     48            0             -100.00%
BenchmarkFromString/urn-8           48            0             -100.00%
BenchmarkFromString/braced-8        48            0             -100.00%
BenchmarkGenerator/NewV1-8          0             0             +0.00%
BenchmarkGenerator/NewV2-8          0             0             +0.00%
BenchmarkGenerator/NewV3-8          144           144           +0.00%
BenchmarkGenerator/NewV4-8          16            16            +0.00%
BenchmarkGenerator/NewV5-8          176           176           +0.00%

codecov-io commented 5 years ago

Codecov Report

Merging #79 into master will increase coverage by 0.07%. The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #79      +/-   ##
==========================================
+ Coverage   99.05%   99.12%   +0.07%     
==========================================
  Files           4        5       +1     
  Lines         318      344      +26     
==========================================
+ Hits          315      341      +26     
  Misses          2        2              
  Partials        1        1

Impacted Files	Coverage Δ
slice_conversion.go	`100% <100%> (ø)`
uuid.go	`100% <100%> (ø)`	:arrow_up:
codec.go	`100% <100%> (ø)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update abfe188...2b7aac2. Read the comment docs.

ztstewart commented 5 years ago

The CI build passes now; apologies for the spam.

As noted, this change improves the performance of FromString and similar methods by a decent margin. More importantly, it means that UUIDs can live on the stack when parsed from a string or bytes, saving garbage collection time.

A few things I encountered that tripped me up:

I had trivially changed fmt.Errorf to errors.New on a line that is uncovered in defaultHWAddrFunc. My attempt to modify the code to cover it made UUID generation slower, so I've reverted my original change.
On Go 1.9 and lower, golang.org/x/tools/cmd/cover does not work if there are subpackages. I wanted to put the slice logic in internal/unsafe which would be cleaner, but creating that package fails the build. If we're OK with dropping 1.9 and lower from CI or have a workaround I can move it around again.

Thanks for taking a look, Zach

ztstewart commented 5 years ago

@acln0 @theckman, I believe you two reviewed the last couple of PRs. I'd greatly appreciate it if you could take a look.

Thanks a bunch, Zach

dylan-bourque commented 5 years ago

My 2 cents ... resorting to unsafe (and introducing specifically called out UB and panics) for the sake of shaving a handful of nanoseconds off of generating formatted error messages is a net loss for a general purpose library. The new code is also significantly more complex than before and, in my opinion, requires some intimate knowledge of the internals of Go's memory management to make sense of why anyone would even care.

I would argue that a system where those <20 nanoseconds make a difference should probably be implementing their own UUID type.

ztstewart commented 5 years ago

HI @dylan-bourque, thanks for reading the PR, and I appreciate the feedback.

My 2 cents ... resorting to unsafe (and introducing specifically called out UB and panics) for the sake of shaving a handful of nanoseconds off of generating formatted error messages is a net loss for a general purpose library.

I would argue the opposite regarding generality: without these optimizations, the library becomes less generally useful. Reducing allocations in the NewFromString path to 0 means that a UUID can be stack allocated, which means that at scale you can keep the garbage collector happy. Wrapping it into the library itself means everyone who uses it benefits without any changes in the calling code.

As for panics: Panics can crop up inside the library if Go changes how it lays out strings (unlikely, given how proposals related to that are tagged with Go2) or if the library starts modifying user input in UnmarshalText. Modifying the input would be a bug, regardless of whether or not this PR is landed.

I'm happy to add tests for functions that return []byte to guarantee that modifying the output does not cause panics.

The new code is also significantly more complex than before and, in my opinion, requires some intimate knowledge of the internals of Go's memory management to make sense of why anyone would even care.

I think you raise a valid point: We want to keep the bar for contribution as low as possible. That means code that is easy to read and understand, and I agree that's what we should optimize for.

However, I think the changes here don't require much specialized knowledge outside of the unsafe code, though I've encountered variations on this trick in a few places (including the standard library).

My hope is that the clarifying comments which I have added (I would love your feedback on them!) can help ensure that future contributors understand the motivation for the changes I've made in error logging code. I personally feel that make and copy are not very complex, being built in functions, but if you have suggestions on how to make the changes less complex, I'd be very appreciative of your suggestions.

I would argue that a system where those <20 nanoseconds make a difference should probably be implementing their own UUID type.

That's certainly an option, and one I'm willing to pursue if we cannot come to an agreement on how to move forward. However, the work required to significantly optimize this library is fairly small and helps the entire Go community. I feel it would be a loss to the wider community to introduce yet another UUID library, and I'd love to work with you to avoid that possibility.

Thanks again for you feedback, Zach

dylan-bourque commented 5 years ago

@ztstewart

By "general purpose", I meant that I think the vast majority of this library's consumers (the general case) won't actually care about those few nanoseconds and/or whether or not those 16 bytes escape to the heap when a UUID parsing error occurs.

I would argue the opposite regarding generality: without these optimizations, the library becomes less generally useful

I also have concerns that these micro-optimizations could well turn out to be architecture/OS/platform/Go version/etc. dependent.

For sure, though, this is my opinion - and you know what they say about those :-) - and I'm definitely not any sort of "owner" over the project.

lunemec commented 4 years ago

@dylan-bourque while I agree that code should be as simple as possible, this code is not overly complex. And in addition it is used only in marshaling of the UUID to output. I would be OK with this being merged if it can be proven safe.

dylan-bourque commented 4 years ago

@lunemec I still think this change is, for lack of a better word, unnecessary. It adds more lines of warning comments than actual code and is optimizing for stack vs. heap allocations done inside of the standard library (which are subject to change in later Go releases). It also introduces the use of unsafe to what is a very simple library for the sake of those micro-optimizations.

I won't block this PR but I'm also not going to approve it.

lunemec commented 4 years ago

@ztstewart I guess since you put effort into optimizing that this is something important for your use-case?

@dylan-bourque maybe adding this as compile tag (only on-demand) would satisfy you, and it would allow users who want extra fast UUID parsing to use the unsafe method?

dylan-bourque commented 4 years ago

As I said, I won't block the change if others approve and are willing to accept it.

cameracker commented 3 years ago

Thanks for the contribution here. For the most part, this review has sat idle for a while due to some strong objections about increase of code complexity and use of unsafe methods for unclear performance benefit. Going to close this. If anybody has any objections, we can continue to talk about the changes here, and the PR can be re-opened.

gofrs / uuid

Modify codec to avoid heap allocations in the happy path #79

Codecov Report