Return UUID as Base 58 - Githubissues

jdgamble555 commented 1 year ago

It would be cool to have an option to search and return uuids in base58. It would work like this under the hood:

encode

CREATE OR REPLACE FUNCTION get_all_posts_with_base58_uuid() RETURNS SETOF record AS $$
  SELECT *, encode(uuid_column, 'base58') as base58_uuid
  FROM posts;
$$ LANGUAGE SQL;

decode

CREATE OR REPLACE FUNCTION find_post_by_base58_uuid(
  base58_uuid text
) RETURNS SETOF posts AS $$
  SELECT *
  FROM posts
  WHERE uuid_column = decode(base58_uuid, 'base58');
$$ LANGUAGE SQL;

And maybe::

HEAD /bigtable?limit=25 HTTP/1.1
Prefer: uuid=base58

Default would obviously be base16... other options probably wouldn't be useful.

This would return all UUIDs in base58, or maybe allow you to select the fields you want like:

Prefer: posts.id=base58

J

steve-chavez commented 1 year ago

Ah, seems like a particular use case of https://github.com/PostgREST/postgrest/pull/2523.

wolfgangwalther commented 1 year ago

Closing as duplicate of PostgREST/postgrest#2523.

steve-chavez commented 1 year ago

This is the perfect example for https://github.com/PostgREST/postgrest/pull/2523. Putting it here to use it for the reference doc.

steve-chavez commented 1 year ago

Default would obviously be base16... other options probably wouldn't be useful. Prefer: uuid=base58

It seems possible to change the decoding format according to the header. So Prefer: uuid=base58 or Prefer: uuid=base16 can be specified if needed.

create domain uuid_custom_base as uuid;

CREATE CAST (uuid_custom_base AS json) WITH FUNCTION json(uuid_custom_base) AS IMPLICIT;

-- pseudocode
create or replace function json(uuid_custom_base) returns json as $$
  select
    case
      when (current_setting('request.headers')::json)->'prefer' = 'uuid=base16' then
      to_json(encode($1, 'base16'))
    else
      to_json(encode($1, 'base58'))
    end
$$ language sql;

aljungberg commented 1 year ago

It seems possible to change the decoding format according to the header. So Prefer: uuid=base58 or Prefer: uuid=base16 can be specified if needed.

Yes, modifying the output format based on a header seems feasible. You could implement a dynamic data rep function which draws configurations from the request. (This idea could theoretically extend to a data rep that garners settings from an "API configuration table," but I suspect that would be rather awful to work with in practice, akin to a function in programming that varies its behaviour contingent on a global variable. Error prone and surprising.)

In practice, while different APIs might vary on using base16 or base58, I have to believe a singular API which can’t decide one way or another on how to output ids would be a rare beast indeed.

It's sometimes desirable to be precise in what you produce but forgiving in what you accept. In this case you wouldn't need a header to do so. To accept both formats, simply count the characters — base16 UUIDs are longer than base58 ones. So with static data rep functions you could accept various input forms and maintain consistent canonical output without needing specific settings or headers at all.

jdgamble555 commented 1 year ago

For my use case I did something like this:

CREATE TABLE posts (
    id uuid NOT NULL DEFAULT uuid_generate_v4(),
    short_id text NOT NULL GENERATED ALWAYS AS (uuid_to_base58(id)) STORED,
    ...
);

I could not get SQL to handle the memory of the conversion, so I used the V8 extension to directly translate my JS. Let me know if this could be simplified with pure SQL.

CREATE EXTENSION IF NOT EXISTS plv8;

CREATE OR REPLACE FUNCTION uuid_to_base58(uuid uuid)
  RETURNS text AS $$
    const base58Alphabet = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz';
    let bigintValue = BigInt('0x' + uuid.toString().replace(/-/g, ''));
    let base58 = '';
    while (bigintValue > BigInt(0)) {
        const remainder = bigintValue % BigInt(base58Alphabet.length);
        bigintValue /= BigInt(base58Alphabet.length);
        base58 = base58Alphabet.charAt(Number(remainder)) + base58;
    }
    return base58;
$$ LANGUAGE plv8 IMMUTABLE;

But obviously if I didn't need an extra column and I could just translate it directly with PostgREST, this would be ideal and save time.

This is important for me because I want my url to be simpler than my database UUID.

J

steve-chavez commented 1 year ago

but I suspect that would be rather awful to work with in practice, akin to a function in programming that varies its behaviour contingent on a global variable. Error prone and surprising.

@aljungberg Agree. We can now move forward with https://github.com/PostgREST/postgrest/pull/2839, let me know if you have any feedback there.

aljungberg commented 1 year ago

But obviously if I didn't need an extra column and I could just translate it directly with PostgREST, this would be ideal and save time.

Yep, not needing an extra column just to format a field is one of the founding reasons for the data reps feature we're working on now.

I could not get SQL to handle the memory of the conversion, so I used the V8 extension to directly translate my JS. Let me know if this could be simplified with pure SQL.

You could adapt this function we use at Screenly for formatting ULIDs (base32) stored as UUIDs behind the scenes. But it's a bit of a monster!

CREATE OR REPLACE FUNCTION public.ulid_str(input uuid) RETURNS text
LANGUAGE plpgsql
AS $$
DECLARE
    -- Crockford's Base32
    encoding   BYTEA = '0123456789ABCDEFGHJKMNPQRSTVWXYZ';
    output     TEXT = '';
    ulid       BYTEA;
BEGIN
    ulid = decode(replace(input::text, '-', ''), 'hex');

    -- Encode the timestamp
    output = output || CHR(GET_BYTE(encoding, (GET_BYTE(ulid, 0) & 224) >> 5));
    output = output || CHR(GET_BYTE(encoding, (GET_BYTE(ulid, 0) & 31)));
    output = output || CHR(GET_BYTE(encoding, (GET_BYTE(ulid, 1) & 248) >> 3));
    output = output || CHR(GET_BYTE(encoding, ((GET_BYTE(ulid, 1) & 7) << 2) | ((GET_BYTE(ulid, 2) & 192) >> 6)));
    output = output || CHR(GET_BYTE(encoding, (GET_BYTE(ulid, 2) & 62) >> 1));
    output = output || CHR(GET_BYTE(encoding, ((GET_BYTE(ulid, 2) & 1) << 4) | ((GET_BYTE(ulid, 3) & 240) >> 4)));
    output = output || CHR(GET_BYTE(encoding, ((GET_BYTE(ulid, 3) & 15) << 1) | ((GET_BYTE(ulid, 4) & 128) >> 7)));
    output = output || CHR(GET_BYTE(encoding, (GET_BYTE(ulid, 4) & 124) >> 2));
    output = output || CHR(GET_BYTE(encoding, ((GET_BYTE(ulid, 4) & 3) << 3) | ((GET_BYTE(ulid, 5) & 224) >> 5)));
    output = output || CHR(GET_BYTE(encoding, (GET_BYTE(ulid, 5) & 31)));

    -- Encode the entropy
    output = output || CHR(GET_BYTE(encoding, (GET_BYTE(ulid, 6) & 248) >> 3));
    output = output || CHR(GET_BYTE(encoding, ((GET_BYTE(ulid, 6) & 7) << 2) | ((GET_BYTE(ulid, 7) & 192) >> 6)));
    output = output || CHR(GET_BYTE(encoding, (GET_BYTE(ulid, 7) & 62) >> 1));
    output = output || CHR(GET_BYTE(encoding, ((GET_BYTE(ulid, 7) & 1) << 4) | ((GET_BYTE(ulid, 8) & 240) >> 4)));
    output = output || CHR(GET_BYTE(encoding, ((GET_BYTE(ulid, 8) & 15) << 1) | ((GET_BYTE(ulid, 9) & 128) >> 7)));
    output = output || CHR(GET_BYTE(encoding, (GET_BYTE(ulid, 9) & 124) >> 2));
    output = output || CHR(GET_BYTE(encoding, ((GET_BYTE(ulid, 9) & 3) << 3) | ((GET_BYTE(ulid, 10) & 224) >> 5)));
    output = output || CHR(GET_BYTE(encoding, (GET_BYTE(ulid, 10) & 31)));
    output = output || CHR(GET_BYTE(encoding, (GET_BYTE(ulid, 11) & 248) >> 3));
    output = output || CHR(GET_BYTE(encoding, ((GET_BYTE(ulid, 11) & 7) << 2) | ((GET_BYTE(ulid, 12) & 192) >> 6)));
    output = output || CHR(GET_BYTE(encoding, (GET_BYTE(ulid, 12) & 62) >> 1));
    output = output || CHR(GET_BYTE(encoding, ((GET_BYTE(ulid, 12) & 1) << 4) | ((GET_BYTE(ulid, 13) & 240) >> 4)));
    output = output || CHR(GET_BYTE(encoding, ((GET_BYTE(ulid, 13) & 15) << 1) | ((GET_BYTE(ulid, 14) & 128) >> 7)));
    output = output || CHR(GET_BYTE(encoding, (GET_BYTE(ulid, 14) & 124) >> 2));
    output = output || CHR(GET_BYTE(encoding, ((GET_BYTE(ulid, 14) & 3) << 3) | ((GET_BYTE(ulid, 15) & 224) >> 5)));
    output = output || CHR(GET_BYTE(encoding, (GET_BYTE(ulid, 15) & 31)));

    RETURN output;
END
$$ IMMUTABLE;

jdgamble555 commented 1 year ago

If you got that for base58, let me know! My URLs already use that version.

Thanks!

J

wolfgangwalther commented 7 months ago

Data Representations are documented now including examples for how to do this with base64: https://postgrest.org/en/v12/references/api/domain_representations.html#domain-response-format

Whether to do that with base58 or something else is a pure PostgreSQL question, not related to PostgREST.

PostgREST / postgrest-docs

Return UUID as Base 58 #652