RobThree / IdGen

Twitter Snowflake-alike ID generator for .Net
MIT License
1.18k stars 147 forks source link

generators clock running fast #19

Closed hero-joel closed 4 years ago

hero-joel commented 4 years ago

Hi,

I have an issue with conflicting ID's being generated which i cannot wrap my head around,

I don't do any fancy configuration, just the default mask with a generator id. I do however run it inside an Orleans Grain that has the same key as the generator id meaning it should be impossible to have 2 different instances/activation of it. Its also being run in a docker container.

i use the default DefaultTimeSource

generator gets created as such: this.RoundIdGenerator = new IdGenerator(integrationId) ... Task.FromResult(RoundIdGenerator.CreateId()

On the same generator i got this the following sequences 2 days apart:

678847413238120448 678847436310986752
678847441495146496 ... 678847431995047936 678847436310986752 Collision 678847526589186048

the system was rebooted in between which would have triggered a new generator and assume that also resets the drift where the generators clock runs

What is the best way to solve this? how do i avoid the generators stopwatch to run to fast?

RobThree commented 4 years ago

I do however run it inside an Orleans Grain that has the same key as the generator id meaning it should be impossible to have 2 different instances/activation of it.

That's a conclusion (assumption?) I cannot currently confirm; I'm not familiar with Orleans Grain. All I can do is assume you're correct on this matter.

Its also being run in a docker container.

That shouldn't affect it AFAIK.

On the same generator i got this the following sequences 2 days apart: the system was rebooted in between which would have triggered a new generator and assume that also resets the drift where the generators clock runs

That would mean the stopwatch would have drifted ahead at least two days, then the system rebooted and then the collision occured. I'm not saying it's impossible but I do find it hard to believe.

According to the ID;s you provided they were generated at:

678847413238120448                2020-02-17T06:17:15.6910000Z
678847436310986752                2020-02-17T06:17:21.1920000Z
678847441495146496                2020-02-17T06:17:22.4280000Z
...
678847431995047936                2020-02-17T06:17:20.1630000Z
678847436310986752 Collision      2020-02-17T06:17:21.1920000Z
678847526589186048                2020-02-17T06:17:42.7160000Z

What version are you using?

What is the best way to solve this?

I'm not sure.... If you're using the default mask config then you'll have 10 bits available for the Generator Id. I don't know how many generators you actually use but since you have 1024 available you could 'cycle' through some of them on each reboot; it's a dirty hack that shouldn't be required but it could help.

how do i avoid the generators stopwatch to run to fast?

Quite honestly, I think that's a question you'll need to ask @Microsoft...

If you can provide a MCVE I could take a deeper look...

hero-joel commented 4 years ago

Thank you for your response.

That's a conclusion (assumption?) I cannot currently confirm; I'm not familiar with Orleans Grain. All I can do is assume you're correct on this matter.

Sorry that was more of a statement, i am certain im not creating multiple generators.

That would mean the stopwatch would have drifted ahead at least two days, then the system rebooted and then the collision occured. I'm not saying it's impossible but I do find it hard to believe.

Yeah, it's tough one to believe but i can see the drift grows fast, did measurement and can see the following drift after 3 hours already : DateTime.UtcNow : 14:48:08.389 Generated Id: 14:48:17.151 that's almost 10 seconds.

What version are you using?

I upgraded to 2.1 and i can confirm it occurs there as well.

Cycling the generator Id's on reboot would be a last option and something id like to refrain from. I am using every default for now.

Would it be an idea to implement ITimeSource based on DateTime.UtcNow rather than a stopwatch?

RobThree commented 4 years ago

Would it be an idea to implement ITimeSource based on DateTime.UtcNow rather than a stopwatch?

No, because that's what we used to have and opens op a whole other can of worms ;) (For example: when the clock is adjusted by NTP or the user...)

I've been running my own 'drift test' for about half an hour now and I'm at 00:00:00.0010021 delta. I will post the code shortly so you can run it on your system and we can compare.

RobThree commented 4 years ago

Try this:

Create a new console application, reference the IdGen NuGet package and paste this code:

using IdGen;
using System;
using System.Threading;

public class Program
{
    private static readonly IdGenerator _idgen = new IdGenerator(0);

    private static void Main(string[] args)
    {
        var start = DateTimeOffset.UtcNow;
        var total = TimeSpan.Zero;
        var measurements = 0;

        while (true)
        {
            measurements++;
            var time = DateTimeOffset.UtcNow;
            var timefromidgen = _idgen.FromId(_idgen.CreateId()).DateTimeOffset;
            var delta = time - timefromidgen;
            total += delta;
            Console.WriteLine($"Time: {time}, running: {DateTime.UtcNow - start}, delta: {delta}, avg: {total / measurements}");
            Thread.Sleep(1000 * 10);
        }
    }
}

And run it for a while. Then let me know the outcome.

RobThree commented 4 years ago
....
Time: 3-3-2020 16:15:10, running: 01:12:50.4225929, delta: 00.0013092, avg: 00.0010835
Time: 3-3-2020 16:15:20, running: 01:13:00.4233282, delta: 00.0010409, avg: 00.0010834
Time: 3-3-2020 16:15:30, running: 01:13:10.4238357, delta: 00.0015513, avg: 00.0010845
Time: 3-3-2020 16:15:40, running: 01:13:20.4249001, delta: 00.0006160, avg: 00.0010834
...
Time: 4-3-2020 08:21:12, running: 17:18:52.5714447, delta: -01.6508408, avg: -00.7855209
Time: 4-3-2020 08:21:22, running: 17:19:02.5718108, delta: -01.6504738, avg: -00.7856597
Time: 4-3-2020 08:21:32, running: 17:19:12.5727303, delta: -01.6505557, avg: -00.7857984
...
Time: 4-3-2020 13:32:48, running: 22:30:28.3592984, delta: -01.6509860, avg: -00.9850525
Time: 4-3-2020 13:32:58, running: 22:30:38.3603360, delta: -01.6509482, avg: -00.9851347
Time: 4-3-2020 13:33:08, running: 22:30:48.3613813, delta: -01.6509042, avg: -00.9852169
...
hero-joel commented 4 years ago

Hi Rob,

Again, thanks for taking your time on this. I have implemented your suggested debugging, and currently running it on the stage environment. will let you know once i start seeing the drift there too.

i had have no luck recreating a drift on my local machine.

hero-joel commented 4 years ago

Hi Rob, After banging my head on this for a full day i can now recreate it locally, but not outside of Orleans, as i see it there is a weird behaviour in IdGen i can't explain.

I create the Generator instance inside a singleton that is a grain in the orleans framework.

this.RoundIdGenerator = new IdGenerator(1);

I generate a few id's and it looks good.

[09:55:18.388 DBG 685045596611940352] Generator 1 ,Time: 05/03/2020 08:55:18, Time from Id: 05/03/2020 08:55:23 +00:00, running: 00:00:00.0020839, delta: -00:00:05.3062169, avg: -00:00:05.3062169

interesting it shows a 5 sec delta from the start, but the delta is not increasing until the singleton is destroyed (grain deactivated). Now it should be ok that it gets destroyed because once i need a new id again this singleton will be recreated, and i will construct a new generator to make sure it creates a new instance.

this.RoundIdGenerator = new IdGenerator(1);

however, now when i start getting Id's the the delta will have grown:

[09:55:37.664 DBG 685045596611940352] Generator 1 ,Time: 05/03/2020 08:55:37, Time from Id: 05/03/2020 08:56:02 +00:00, running: 00:00:00.0006764, delta: -00:00:24.5822786, avg: -00:00:24.5822786

it seems that a new timesource is not created when run "new IdGenerator(1);" again instead it reuses the old timesource somehow.

hero-joel commented 4 years ago
private static  IdGenerator _idgen;

private static void Main(string[] args)
{
    var start = DateTimeOffset.UtcNow;
    var total = TimeSpan.Zero;
    var measurements = 0;
    _idgen = new IdGenerator(1);

    while (true)
    {
        measurements++;
        var time = DateTimeOffset.UtcNow;
        var timefromidgen = _idgen.FromId(_idgen.CreateId()).DateTimeOffset;
        var delta = time - timefromidgen;
        total += delta;
        Console.WriteLine($"Time: {time}, running: {DateTime.UtcNow - start}, delta: {delta}, avg: {total / measurements}");
        Thread.Sleep(1000 * 10);
        _idgen = new IdGenerator(1);
    }
}

This test recreates the weird behaviour.

RobThree commented 4 years ago

it seems that a new timesource is not created when run "new IdGenerator(1);" again instead it reuses the old timesource somehow.

It isn't and that's by design.

But somehow it appears that this Orleans Grain stops time when it gets destroyed... You could implement your own ITimeSource, it's not that hard, but does have some pitfalls. You could try something like:

public class ClockTimeSource : ITimeSource
{
    private long lasttimestamp = 0;

    public DateTimeOffset Epoch { get; }
    public TimeSpan TickDuration => TimeSpan.FromMilliseconds(1);

    public ClockTimeSource()
        : this(IdGenerator.DefaultEpoch) { }

    public ClockTimeSource(DateTime epoch) => Epoch = epoch;

    public long GetTicks()
    {
        var timestamp = (long)(DateTimeOffset.UtcNow - Epoch).TotalMilliseconds;
        // Make sure time only goes forward
        if (timestamp > lasttimestamp)
            lasttimestamp = timestamp;
        return lasttimestamp;
    }
}

(No warranties, I just whipped this up from the top of my head). You should be able to create a singleton from this ITimeSource and pass it to each IdGenerator you create. It may not be the most performant and it may have subtle issues, but you could try it...

This test recreates the weird behaviour.

Define "weird behaviour"?


My test, by the way, is still running:

...
Time: 5-3-2020 09:48:12, running: 1.18:45:52.4978648, delta: -03.1684215, avg: -01.6441190
Time: 5-3-2020 09:48:22, running: 1.18:46:02.4983287, delta: -03.1679574, avg: -01.6442180
Time: 5-3-2020 09:48:32, running: 1.18:46:12.4986343, delta: -03.1686522, avg: -01.6443170
...
hero-joel commented 4 years ago

Sorry, i didn't include the output from the "test". as you can see the delta grows every time i create a new instance of the generator?

05/03/2020 09:34:13 +00:00, running: 00:00:03.3175749, delta: -00.0093570, avg: -00.0093570 05/03/2020 09:34:23 +00:00, running: 00:00:13.3353828, delta: -13.3264775, avg: -06.6679172 05/03/2020 09:34:33 +00:00, running: 00:00:23.3366473, delta: -23.3282133, avg: -12.2213493 05/03/2020 09:34:43 +00:00, running: 00:00:33.3390627, delta: -33.3298018, avg: -17.4984624

Why is that intended? (not questioning you, just want to understand).

RobThree commented 4 years ago

as you can see the delta grows every time i create a new instance of the generator?

That's a pre-V2.2 issue. See #20 and upgrade to 2.2.

Please note that IdGenerators are, normally, intended to be kept around as long as possible; preferably during the lifetime of the application.

hero-joel commented 4 years ago

2.2 fixes it indead, noted on the generator lifetime. Thank you very much!

RobThree commented 4 years ago

Reading over this issue again it seems that you may only create a single IdGenerator every time this "Grain" is 'active' or 'woken up' or something but somehow it keeps state (evident from this < 2.2 bug).

What had me confused is that you stated "i am certain im not creating multiple generators.". Technically, for this "Grain" you may not be, but apparently these "Grains" (I really don't know what the hell I'm talking about 🤣 ) are activated/deactivated or awake/put to sleep or something and each time that happens you do create a new IdGenerator.

Since you talked about 'drift' (quite a lot, 10 minutes over 3 hours) but not as much as as later demonstrated) I didn't think it was related to #20 and had the appearance of the internal stopwatch actually running fast (maybe it still does?). But I guess what happens is that this "Grain" is only inactive/asleep for very short periods which causes a 'gradual drift' of small amounts of time that, eventually, resulted in about 10 minutes over 3 hours.

At least, that's the best explanation I can give for what seemed to be going on here... 😜 I hope your problem is fixed for good now.