gofrs / uuid

A UUID package for Go
MIT License
1.57k stars 110 forks source link

Drop getClockSequence mutex in favour of atomics #83

Closed alcore closed 3 years ago

alcore commented 4 years ago

This PR drops the mutex used by getClockSequence in favour of an atomic CAS and atomic sequence increment, primarily motivated by performance.

It introduces a (documented) edge case wherein under extreme contention a time progression can result in the first UUID in the new time interval not reusing the old clock sequence but instead ending up with an incremented sequence (as a safety to avoid duplicates). This changes the internal behaviour and may be not compliant with the spec, depending on the interpretation. The safety net could be simplified if the generator was allowed to increment the clockSequence along with each time progression (which I refrained from), not just within the current (unchanged) time interval.

The gains on go 1.14.1, Windows 10, i7 4770k @ 4.4GHz (windows/amd64) are as follows:

Sequential:

benchmark                              old ns/op     new ns/op     delta
BenchmarkGenerator/NewV1               36.4          28.0          -23.08%
BenchmarkGenerator/NewV2               45.1          34.8          -22.84%

Concurrent (8 threads):

benchmark                              old ns/op     new ns/op     delta
BenchmarkGenerator/NewV1Parallel-8     138           24.0          -82.61%
BenchmarkGenerator/NewV2Parallel-8     142           24.2          -82.96%

Since there is no concurrent benchmark in the test suite, I used the trivial...

b.Run("NewV1Parallel", func(b *testing.B) {
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            NewV1()
        }
    })
})

The PR includes a new test case for the V1 generator (and V2 by extension) which simulates a concurrent race of G=1024 goroutines attempting to generate N=256 UUIDs each, to tickle the race detector. Afterwards it checks for duplicates (i.e. primarily to cover the documented edge case). The test is deliberately slow (about 3s on my platform) to give the CPU and Go's scheduler some chances to shuffle things around in the execution flow.

Sidenote: This is only actual test that utilizes the race detector, given that no other tests in the suite run concurrently to actually make use of it (while it's set for Travis builds). If there's interest, I may rectify this in a subsequent PR and race through all generators in a similar manner.

niaow commented 4 years ago

I am not personally in favor of merging this PR, as it makes the code a lot more complicated, and doesn't seem to have a huge impact. The lock is unlikely to be problematically contended unless the user code does nothing but generate UUIDs.

theckman commented 4 years ago

@alcore thank you for this contribution. I'm interested in merging this in, and want to understand the alignment risk before merging. I'm also aligned with Jaden on the readability concern.

Edit: I do wonder if you could speak more about the system or environment that saw benefit from this change.

alcore commented 4 years ago

@theckman I hope my comment in https://github.com/gofrs/uuid/pull/83#discussion_r419904289 clears up the memory alignment concern.

As for readability - so am I :-) I could try to reduce the verbiage of the comments a bit, but as for the code itself, I can't currently think of anything that would simplify it. The goto can be dropped for a for statement instead, the only difference then being style (I refrained since it introduces an additional level of indentation, but then again it might be easier to read).

The setup where this mattered is (was) precisely one which @jaddr2line mentioned as unlikely. It's 8 cores running various log and events aggregation within a data center (several milion / sec). The ID generation was deferred from individual services to the aggregator so a single ID generator could by default ensure no duplicates get generated without having to check for collisions further down the line (some objects would hit databases). In this case using one generator per thread of course alleviates the problem of contention, but then makes the generator's mutex obsolete and doesn't solve the issue of potential duplicates.

In the end I simply rolled my own, but I thought this library could benefit from this change, since the ~20% speedup is - IMHO - relevant not just to cases with such contention.

Edit: The code can be simplified, as I mentioned in the PR comment. Essentially if increasing the clock sequence is not gated behind the...

if timeNow <= timeHi {...

... condition but applies to every ID, then getting and setting the clock sequence becomes a single atomic.AddUint32(&g.clockSequence, 1), both clockSequence loads can get dropped and the CAS success check becomes obsolete as well, which in turn means the loop can get dropped. Since the UUID spec does not really handle clock drifts well, the CAS might then just as well be just a store.

But this then turns into a case of undefined spec, which causes issues like https://github.com/uuid-rs/uuid/issues/106#issuecomment-542522289. The author of the issue interprets the spec as stating that the sequence only gets increased under the conditions (time drift, partition change etc.) the spec mentions. Meanwhile I read the words...

the clock sequence is used to help avoid duplicates that could arise when (...) If (...) then the clock sequence has to be changed.

... as specifying when it has to be changed, but not that it can't be increased all the time. It doesn't really affect the safety, as the sequence will just wrap around anyways. But it would be a breaking change if anyone relies on the assumption that it only increases for IDs in the same time frame (I can't think of a use case for this assumption).

j-mie commented 3 years ago

@alcore Out of curiosity is there any reason why the logic has changed such that you no longer set the last generation time when the time has gone backwards? https://github.com/gofrs/uuid/pull/83/files#diff-0d0495ad16c6f603876bbf484c43b549f53d0b33d4cd74c908b0ee95a94369eaR238-R240 vs https://github.com/gofrs/uuid/blob/master/generator.go#L237-L242

cameracker commented 3 years ago

Thanks for the contribution here. For the most part, this review has sat idle for a while due to some strong objections about increase of code complexity and potential byte alignment concerns. Going to close this. If anybody has any objections, we can continue to talk about the changes here, and the PR can be re-opened.