geckoboard / pgulid

Universally Unique Lexicographically Sortable Identifier (ULID) for PostgreSQL
339 stars 20 forks source link

Support monotonicity #2

Open mattbishop opened 4 years ago

mattbishop commented 4 years ago

Here is the link in the spec: https://github.com/ulid/spec#monotonicity

Without monotonic ULIDs one cannot truly sort by this key in a high-append rate situation.

leocassarani commented 4 years ago

Hi @mattbishop, thanks for opening this issue.

Do you have any ideas for how this might be implemented? Pull requests are very welcome.

mattbishop commented 4 years ago

I'll think about it. The key problem is that the function needs to keep the last-generated ULID so it can check the timestamp; if it is the same millisecond, then it will use the monotonic algorithm with that ULID.

mattbishop commented 4 years ago

What do you think about treating ULID the same way Postgres treats SEQUENCE columns? My probably-naive understanding is that when it needs a new value, it looks up the current sequence for the table in the SEQUENCE system table. It increments that value and updates it. Then uses that value in the INSERT. So, for ULID, consider:

ULID_TABLE

TABLE COLUMN ULID
customers uid 01BX5ZZKBKACTAV9WEVGEMMVRZ
orders order_id 01ARZ3NDEKTSV4RRFFQ69G5FAV
products pid 01BXAVRG61YJ5YSBRM51702F6M
  1. When a ULID is required for a column, the function first looks up the last ULID, if any, for that column.
  2. The function then compares the timestamp value of the ULID with the current timestamp.
  3. If the current time is older or the same as the stored ULID timestamp, then it needs to follow the monotonic rule and increment the random portion of the ULID as per https://github.com/ulid/spec#monotonicity 3a. If the current timestamp is greater than the stored time, or no ULID has been stored for that column, then a new ULID is generated with that timestamp.
  4. This new ULID value is UPDATEd back to the row. (Row-level locking a good idea?)
  5. ULID is returned.

This approach takes wobbly clocks into account by comparing the timestamp values and using the later one in step 3. If the clock has been set back, then the monotonic algorithm is used to increment the ULID without disrupting the timestamp ordering. Eventually the clock will catch up.

MarkMurphy commented 3 years ago

I don't think we need to do a lookup by table and column I think it can be much simpler than that. We can probably translate the reference implementation from javascript into postgres. We'd just have to figure out where to keep the lastTime and lastRandom

MarkMurphy commented 3 years ago

We'd just have to figure out where to keep the lastTime and lastRandom

Actually on that note maybe we could just use a sequence.

CREATE SEQUENCE global_ulid_sequence;

CREATE FUNCTION ulid() RETURNS text AS $$
DECLARE  
    seq_id bigint;
    -- ...

BEGIN  
    SELECT nextval('global_ulid_sequence') % 1024 INTO seq_id;

    -- ...

Inspired by https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c

paynecodes commented 8 months ago

@MarkMurphy Did you ever solve this problem fully? I see the example here, just not sure how it can be used.

pboling commented 8 months ago

I'm also wondering...