LiosK / uuidv7

A JavaScript implementation of UUID version 7
Apache License 2.0
177 stars 3 forks source link

Question: why so large counter? (42bits) #13

Closed osexpert closed 2 months ago

osexpert commented 2 months ago

Hi. I was making my own and was thinking about a randomly seeded counter of 12, 20 or 26 bits, with +1 or a random increment. Even 12 bits with +1 would rarely rollover for me, only when the random seed is close to the max (a problem with any size random seeded counter). Then I saw this using surprisingly large 42bits counter with +1. My thinking is, that a large counter will have many bits that is unchanged when doing +1, and this will decrease the randomness /entropy? So my thinking is, have a counter that is fully "utilized" (changes as many bits as possible) compared to how many Uuids can be made in the same ms. But probably there is a reason here that I am not seeing, I am guessing it can have to do with supporting time going back? Thanks.

LiosK commented 2 months ago

Hi! That's a very good question. This question relates to unguessability and uniqueness properties.

As for unguessability, the 42-bit counter obviously sacrifices this property, because 96 bits of a UUID can be derived from the immediately preceding value. I didn't put priority on unguessability because it is hard by design of UUIDv7 to ensure unguessability given the limited bit space available.

As for uniqueness, let's compare a 42-bit randomly seeded counter and a 42-bit random number field. In summary, when you generate N IDs for each scenario:

  1. The expected number of duplicates is identical because you get N items in the same 42-bit space; but,
  2. The probability of having at least one duplicate is much lower with the randomly seeded counter.
  3. 1 and 2 jointly imply that if you get a duplicate with the randomly seeded counter, then you are very likely to get multiple duplicates.

With the simple random number field, the probability of having at least one duplicate is known as the birthday probability and can be calculated using the formula in the article.

With the randomly seeded counter, you will pick an initial counter value randomly for each batch of consecutive IDs, and as long as the initial values are sufficiently distant from each other, you will never get a duplicate. Such a probability can be calculated using the near-match variant also in the article.

Plug parameters in the formulae and you will find the conclusion in the above summary. You can read a similar discussion in this blog post, too.

For the general UUID use cases, I believe the probability of having at least one duplicate is very important property. That's the reason why I selected the 42-bit counter.

osexpert commented 2 months ago

Thanks. I see that a rust implementation also use the same/similar format as you, possibly based on your design: https://kodraus.github.io/rust/2024/06/24/uuid-v7-counters.html. How do you think this design compares to eg. Ulid's that uses all 80bits as counter? Better, worse, equal?

LiosK commented 2 months ago

I believe I improved ULID's 80-bit counter by ensuring a certain level of unguessability with the trailing 32-bit entropy. 32 bits won't be strong enough against offline brute-force attacks but might be practically sufficient against attacks over the network.

The counter length is smaller than ULID's, but 42 bits are already overkill for realistic use cases. Plus, the smart rollover handling using the timestamp field as a spare counter has made counter overflow a negligible event.

From the implementation standpoint, a 42-bit counter is much easier to implement because it fits in the 64-bit int/float range, than a 80-bit counter which needs BigInt arithmetics..

osexpert commented 2 months ago

I agree. I am currently, in my own, using a 26bit counter and now I do not wonder anymore if I should reduce it:-) Thank you again👍