dylanhart / ulid-rs

This is a Rust implementation of the ulid project
https://crates.io/crates/ulid
MIT License
387 stars 37 forks source link

Generator doesn't actually guarantee 80-bits of UIDs per millisecond? #80

Open vlovich opened 6 months ago

vlovich commented 6 months ago

I could be horribly misreading the code, but on every new millisecond it'll reseed the random state with new entropy. If the increment moves the entropy to RAND_MAX (i.e. 2^80), the generate function returns an error. My statistics is fuzzy, but I think that means on average you can actually onl increment ~2^40 times since the randomness will be close to the half-way point to RAND_MAX half the time leaving only 40 bits of increment. You would also expect that a non-infrequent amount of time you would have very few generations available if the random number generated is or is close to RAND_MAX.

Shouldn't Generator remember the starting entropy so that on hitting RAND_MAX it simply wraps and only returns an error if it were to increment back to the original random state?

dylanhart commented 6 months ago

This is more of a design problem with the original spec & reference implementation. The design basically assumes ~1 ulid per millisecond per process. You're totally correct that it can do better. I chose to implement this behavior in the same way as the reference implementation rather than to try and improve it myself.

Shouldn't Generator remember the starting entropy so that on hitting RAND_MAX it simply wraps and only returns an error if it were to increment back to the original random state?

No because then the function wouldn't be monotonically increasing which is the useful property being implemented here. If you don't care about this property then creating random ulids without a generator is valid as well. I think that overflowing into the next millisecond is what's needed to keep the property valid, but that adds incorrect timestamps.

However, feel free to implement the described behavior into your codebase if it solves your needs. Everything on Ulid should be public and writing a custom Generator to fit your needs should be easy.

vlovich commented 6 months ago

That's a good point. This is the part of the homepage that threw me:

If, in the extremely unlikely event that, you manage to generate more than 2^80 ULIDs within the same millisecond, or cause the random component to overflow with less, the generation will fail.

The "within the same millisecond" part reads non-sensically to me given these design properties. What's it trying to say?

dylanhart commented 6 months ago

See https://github.com/ulid/spec/issues/11 for discussion. I don't think that line is accurate.