Handling uniqueness ID across multiple instance of same service

mehdihadeli commented 1 year ago

Hi, Thanks for this useful package. I have multiple instance of same service with using Kubernetes replicases, how can I handle uniqueness between instances, I cannot set different generatorId on each instance for example using Kubernetes environment variables. Do you have any solution for this in microservices world?

RobThree commented 1 year ago

I cannot set different generatorId on each instance

Well, you're gonna have to 😅 Because that's exactly what it is meant for.

As documented here:

This library provides a basis for Id generation; it does not provide a service for handing out these Id's nor does it provide generator-id ('worker-id') coordination.

There generatorId doesn't have to be static ofcourse; you could consider any of these things:

Look into StatefulSets and try to use that
Build a simple 'coordinator'-service your project can ask for a (unique) generator-id on startup
Use some sort of a hash of the instance / hostname / pod name / whatever (but make sure it'll be unique across all instances!)
Pick a random number and YOLO it 😜

mehdihadeli commented 1 year ago

Thanks for your response

Look into StatefulSets and try to use that

About this I think, it should work for each service instances, but maybe we have some conflicts with other microservices Ids

Build a simple 'coordinator'-service your project can ask for a (unique) generator-id on startup

Yes, this approach works completely, but I need to maintain separate service

Use some sort of a hash of the instance / hostname / pod name / whatever (but make sure it'll be unique across all instances!)

How can I do this in c# and get an integer for using in the generatorId?

Pick a random number and YOLO it

Do you mean using random class?

What about using host MAC address? For example, NewID library uses this approach.

RobThree commented 1 year ago

Yes, this approach works completely, but I need to maintain separate service

Something / someone will have to coordinate the worker-id's (generatorId). If you can't use some kubernetes assigned value then it'll have to come from elsewhere. Where / how you do it is completely up to you - there are an infinite amount of scenario's in which IdGen can be used and that's exactly why it leaves worker-id coordination up to the user. You use kubernetes, someone else uses X or Y, etc. and each time the requirements will differ, as will the implementation.

This service I suggested doesn't have to do very much other than offer a way to get an, say, incremental id for a worker which either wraps around (assuming you're never gonna have more than the 2^10 = 1024 (in the default configuration) workers) or provides a way to declare a worker-id 'disposed' in some way. All you need to do is track which Id is in use and which isn't and that's only on startup / shutdown of your (micro)service / application.

How can I do this in c# and get an integer for using in the generatorId?

Again there are many roads that lead to Rome. You might simply call .GetHashcode() on the hostname (which is probably not a good idea), or use some bits from a SHA1 hash of the hostname or maybe you can make the hostname incremental (like host1, host2, ...) and you can simply get the number from that string. You can then pass that number as generatorId. It could be based on the hostname, some bits from the primary NIC's mac-address, the kubernetes replica index, pod-id, ... anything. Just make sure it's unique (enough) and future proof (enough) so you don't get collisions.

Do you mean using random class?

I was joking 😉 The odds of a collision are much too high.

What about using host MAC address? For example, NewID library uses this approach.

You could use some bits from the MAC address, yes. But note that a MAC address has 48 bits (which are supposed to be unique, but aren't *) and the generator-id part is, by default, only 10 bits. You can reserve and more bits ofcourse, but I don't recommend using all 48. But then the less bits you use (say 10) increases the chance on a collision higher.

* - Cloned VM's, cheap network cards, mistakes by manufacturers, spoofed mac-addresses, it's all fun and games until it's not. MAC addresses aren't as unique as people think. But as long as you have good / total control then, yes, it could be used.

NewID has the 'luxury' of having a total of 128 bits available in which case reserving 48 (plus a few more) bits for the generatorID is a lot easier. IdGen produces more compact (half the number of bits, 64) Id's but the tradeoff is, indeed, that less bits are available for each of the parts that make up an IdGen-id.

About this I think, it should work for each service instances, but maybe we have some conflicts with other microservices Ids

I think I'd first give this option a shot; it looks the most reasonably manageable and useable to me.

mehdihadeli commented 1 year ago

Thanks for all your explanations, I should check options in my app :) Could we also use something like this Guid.NewGuid().GetHashCode() (or Guid.NewGuid().GetDeterministicHashCode() using andrew lock approach for a deterministic gethashcode)? I think it should be also a unique generatorId for all instances and all microservices

RobThree commented 1 year ago

Could we also use something like this Guid.NewGuid().GetHashCode() (or Guid.NewGuid().GetDeterministicHashCode()

This works when you can use all 128 bits of entropy; however: the worker-id is (by default) only 10 bits, 1024 possibilities. The chance of getting a duplicate worker-id is therefore 1 in 1024 for the second worker and this goes up rapidly for each new worker (see birthday paradox). The chance of a collision is much too high.

You can get a hashcode of anything (be it a Guid or just a simple string); you're still stuck with 10 bits you can use for a worker-id (maybe a few more if you adjust the structure of the ID a little). But at that point you may just as well just generate a random 10 bit number. Chances of a collision, be it a random number or some hashcode of some value are, for such a (relatively) small amount of bits, just too high.

I would strongly recommend you do not rely on randomness or hashcodes but rather work towards a (across the board) deterministic way of assigning worker-id's. Be it by just assigning them an incremental number from, say, the kubernetis instance id or by implementing a coordinator service of some sort that keeps track of worker-id's being in use or available.

A hashcode may be deterministic, it still can't guarantee uniqueness; especially if you need to discard some bits of the hashcode since you can only use (by default) 10 bits of that hashcode for the worker-id. Actually, even a GUID can't guarantee uniqueness, but because of the immense space (128 bits of 'randomness' ^{122 bits actually, because of 4 bits being reserved}) the chance of a collision is astronomically small (there are 2¹²² ≈ 5.3 x 10³⁶ = 5316911983139663491615228241121378304 possible Guid's) and therefore negligible. However, we only have (about) 10 bits (you could crank that up a little) available so chances of a collision are much, much higher (2¹⁰ = 1024, crank it up to, say, 16 bits and you still 'only' have 2¹⁶ = 65536 possible generator ID's). You may get away with it for a while, but collisions are pretty much guaranteed to happen pretty quickly. And then things will snowball and spiral VERY quickly.

Again; I urge to not solely rely on a hashcode - though it may be a part of an algorithm to determine a final worker-id for any given worker. That's why I wrote: "make sure it'll be unique across all instances!"

mehdihadeli commented 1 year ago

Thanks for your complete answer, I will skip, using hashcodes.

RobThree / IdGen

Handling uniqueness ID across multiple instance of same service #49