bwmarrin / snowflake

A simple to use Go (golang) package to generate or parse Twitter snowflake IDs
BSD 2-Clause "Simplified" License
2.98k stars 371 forks source link

Data skew in low-concurrency scenarios #44

Open robertsong9972 opened 2 years ago

robertsong9972 commented 2 years ago

Data skew in low-concurrency scenarios

According to the source code, it will set the step to 0 once the time increased, it will cause a problem when we use the id as the sharding key for sharding strategy. The code affects

if now == n.time {
        n.step = (n.step + 1) & n.stepMask

        if n.step == 0 {
            for now <= n.time {
                now = time.Since(n.epoch).Nanoseconds() / 1000000
            }
        }
    } else {
        n.step = 0
    }

Consider the following circumstance:

Scenario

  1. table:order_tab_00000000-order_tab_00001000;
  2. An api for user to create new order: create_order;
  3. use the above logic to generate id when QPS is less than 10;

    Impact

  4. After calculating with mod all of the result of id generated will be less than 10, which means most of the records will be stored into order_tab_00000000-order_tab_00000009, it caused data skew;

    Improve

  5. Make the step round by round, and do not set it to 0;

    Code May be referenced

    n.mu.Lock()
    now := time.Since(n.epoch).Nanoseconds() / 1000000
    n.step = (n.step + 1) & n.stepMask
    if now == n.time && n.step == 0 {
        for now <= n.time {
            now = time.Since(n.epoch).Nanoseconds() / 1000000
        }
    }
    n.time = now
    r := ID((now)<<n.timeShift |
        (n.node << n.nodeShift) |
        (n.step),
    )
    n.mu.Unlock()
    return r
patchthecode commented 1 year ago

does this mean this lib is prone to errors?

bwmarrin commented 1 year ago

@robertsong9972

Thanks for reporting an issue :)

However, I'm struggling to follow what you're trying to explain and how the current method is causing a problem and why it should be changed :( I'm not sure if there's any more explanation you can provide or if there's someone else that might be familiar with the topic you're discussing and can help clarify the issue.

Are you saying the library can generate non-unique ID's with low concurrency? Or are you saying that the way the IDs would sort would be a problem? What does "Make the step round by round" mean?

"After calculating with mod all of the result of id generated will be less than 10, which means most of the records will be stored into order_tab_00000000-order_tab_00000009, it caused data skew;"

You're doing something to truncate the numbers or reduce them? None of the IDs would be less than 10 normally.

I'm also not really clear on the importance of the "order_tab_00000000-order_tab_00000009" tables.

I'm just all around confused, sorry. :(

bwmarrin commented 1 year ago

@patchthecode - I'm not sure :(. Though I've not heard any other reports of the library not generating properly unique IDs. If you run into any issues or if you can help explain the above concern I'd be happy to look into it.

bwmarrin commented 1 year ago

I think I'm putting together what's being asked here, maybe.

@robertsong9972

Are you talking about data sharding via MongoDB (or maybe some other DB)?

By "order_tab_00000000-order_tab_00001000" are you saying you have 1001 order tables?

Are you taking the modulus of the snowflake ID and 1001 (the number of tables you have) to try and get a random number between 0 and 1000?

If you're dividing by 1001 I think it should work but if you happen to be using 1000 I could see how that would return 0 since the last four digits of a showflake using 12 stepbits would be 0000 and the entire number would divide into 1000 leaving no remainder. But if you divide by 1001 that shouldn't be an issue. Another option would be to change the step bits to a smaller value.

By "round by round" do you mean just allow the step count to continue to count up and only reset to zero once it overflows (hits the max number it could be) ? If we did that the library performance under heavy load would be impacted negatively and since there's other ways to solve the problem I'm assuming you're working on I'd rather not do that.

Finally, would just using a rand(1000) method to get a random number from 0 to 1000 to tell the DB what shard to shove the data into work?

If I'm interpreting your issue wrong - please let me know! Hopefully this helps some, I know it's a late attempt to help.