RobThree / NUlid

.Net ULID implementation
MIT License
364 stars 25 forks source link

Monotonic Ulids with same Time #11

Closed cdhunt closed 5 years ago

cdhunt commented 5 years ago

The Javascript reference has an example where Ulids generated in the same millisecond are monotonically increasing. Is there a way to support this functionality?

RobThree commented 5 years ago

I guess we could certainly add this functionality. However, I'm not sure I agree 100% on how it is currently implemented in the JS reference. Because the random part of a ULID is, well, random, you never now how 'soon' you'll run out of id's 'within' that timestamp. Let's assume a very-worst-but-not-impossible case of a starting ULID of 01D0PZMEC0ZZZZZZZZZZZZZZZY. That would mean that in the same millisecond you can only produce 1 more 'monotonic increasing' ID (01D0PZMEC0ZZZZZZZZZZZZZZZZ) and then "run out of" ID's. Ofcourse this is a 1 in 280 chance but it does mean that for fast enough machines it is possible to "run out" of ID's if you simply produce enough ID's in the same millisecond and have a random-part that's "high enough" to provide only a short 'range' of available, monotonic, ID's likeZZZ...ZA0 for example.

A behaviour could be defined to 'wrap around' or overflow into 00000...1 for example or use the RNG to provide a new starting point for increasing ID's (but that would have the added 'danger' of starting a range in the already 'used' ID's or just below the previous range providing, again, ample id's before an actual overlap would even occor and that would mean keeping track of all this to prevent overlaps etc.). I think we need to first define and think through the desired behaviour before we implement anything and that would mean we would have to discuss this with the original author(s) and see if we can agree upon a 'standard'.

Having said all that; I'm not sure if you insist on using a ULID but if you don't you could consider using IdGen (disclaimer: I'm (also) the author of that package). It does provide monotonicity out of the box.

Having said all that; it would certainly be possible and (not having thought this through all the way) my guess is that it wouldn't be very hard to implement (even without breaking changes). Actually, I wasn't even aware the reference added monotonicity in the first place (meaning it was added after this .Net implementation was made), else I would've probably taken a look at it earlier and maybe even already implemented it in this .Net implementation too. So thanks for making me aware of this. I'll just sleep on it and consider the best next step from here, ok?

cdhunt commented 5 years ago

It's more of a hypothetical question for my use case. I doubt I'll ever produce two ids in one millisecond, but I like the idea of it being handled. I prefer the standard that has implementations across platforms and the reference implementation is good enough for me. Thanks for the well thought out response and creating the project.

RobThree commented 5 years ago

I doubt I'll ever produce two ids in one millisecond, but I like the idea of it being handled.

To be clear; It is handled; it's just the random part of the ID that will differ "entirely" instead of being increased by 1. The only difference is that ID's produced within the same millisecond may not be in the same order after (re)sorting. But that would only mean an uncertainty of the order within the same millisecond; as long as they're generated at different timestamps (with ms resolution that is) the order is guaranteed by the time-part.

I prefer the standard that has implementations across platforms and the reference implementation is good enough for me.

To be fair, the reference implementation provides a monotonically increasing ID, it just wasn't in the spec at the time I implemented this .Net version. It was added to the spec after.

The case(s) I explained are a bit far-fetched but not entirely impossible. On the other hand there's also a chance of a collision when generating a random part instead of increasing the "random part" by one when generating a ULID in the same millisecond. Either way there is a tiny, tiny, tiny chance of a collision or overflow or whatever other problem occurring. And I just hate that; it should be deterministic and reliable. On the other hand, when generating a GUID/UUID ID's there's also a chance of a collision (though for a 50/50 chance on a collision you'll need to generate (roughly, about) one in 264 instead of one in 240 for a ULID considering birthday paradoxes and whatnot and depending on the version of the GUID, where I'm assuming the best-case V4).

andrzejmartyna commented 5 years ago

I agree that idea of 'monotonic increasing' IDs should be further discussed. IMHO it should be even removed from specification as very hard to implement well. IMO it might be OK within a single thread but what about keeping it across different threads, processes or nodes? It could be rather 'local flavor' giving false sense of comfort than something robust.

RobThree commented 5 years ago

I'll propose more discussion over at the spec so we can decide on what should be done. I, personally, also prefer just getting rid of the monotonicity.

I noticed there's already an issue on the matter.

RobThree commented 5 years ago

V1.4.0 introduces monotonic ULID's. I'm still not very fond of the idea, but if I / we want to adhere to the spec then we better implement it. My implementation is explicit opt-in (using a MonotonicRng) and the MonotonicRng implementation, by default, keeps the 10 most significant bits at zero when a random number is generated initially. This means that we have enough room before we overflow (but the random number 'space' is 10 bits less; out of 80 that shouldn't be a problem, we still have 70 random bits left (which is still a lot)). And when the value overflows it overflows into the bits reserved explicitly for that. It means we can overflow the 'random part counter' 2^10 = 1024 times before we hit the actual overflow. And keep in mind that only for the first 'round' the 'offset' of the random value happens; all other rounds have the full 70 bits available for counting. And last but not least; you can control this number of bits being reserved and set it to anything from 0 to 70 (where 0 gives the least 'headroom' or 0 'rounds' and higher values gives you more 'rounds' but an increasingly higher chance of a collision when generating the random 'offset' (or 'seed').

It is kind of hard to explain, I hope the code speaks for itself.