gajus / slonik

A Node.js PostgreSQL client with runtime and build time type safety, and composable SQL.
Other
4.56k stars 138 forks source link

What is the expected behavior of serializing BigInt to JSON? #515

Open gajus opened 1 year ago

gajus commented 1 year ago

given this test:

test.only('serializes BigInt to JSON', async (t) => {
  const pool = await createPool(t.context.dsn, {
    PgPool,
  });

  const result = await pool.oneFirst(sql.unsafe`
    SELECT json_object_agg('foo', ${BigInt(9_007_199_254_740_999n)}::bigint)
  `);

  t.is(result, /* ??? */);

  await pool.end();
});

What is the expected value of the result?

I would think it should be {foo: BigInt(9_007_199_254_740_999n)}, but I am not certain if that is even possible, nor would that be valid JSON.

I am leaning towards throwing an error.

Probably related to https://github.com/gajus/slonik/issues/513

AndrewJo commented 1 year ago

I would think it should be {foo: BigInt(9_007_199_254_740_999n)}, but I am not certain if that is even possible, nor would that be valid JSON.

The above representation is a spec violation according to IETF RFC8259: The JavaScript Object Notation (JSON) Data Interchange Format.


I think you should handle it similar to how Date or RegExp is commonly serialized in JSON. The most common cases I've come across is for BigInt to be serialized into a string or into a plain number.

Serialization into strings

// POJO
{
  // Serialize into integer with n at the end
  commentId: 9_007_199_254_740_999n,

  // Serialize into ISO8601 format
  createdAt: new Date("2023-10-05T12:34:56"),

  // Serialize into regular expression that can be passed back in to RegExp constructor
  regEx: /abc[dD]/
}

// JSON
{
  "commentId": "9007199254740999n",
  "createdAt": "2023-10-05T12:34:56",
  "regEx": "abc[dD]"
}

Pros:

Cons:


Serialization into plain number

// POJO
{
  // Serialize into an integer
  commentId: 9_007_199_254_740_999n,

  // Serialize into UNIX epoch in seconds
  createdAt: new Date("2023-10-05T12:34:56")
}

// JSON
{
  "commentId": 9007199254740999,
  "createdAt": 1696509296,
}

Pros:

Cons:

gajus commented 1 year ago

Thank you.

What do you think about just throwing an error if an attempt it made to serialize BigInt into JSON?, i.e. require that user cast the value to X before serialization, e.g.

await pool.oneFirst(sql.unsafe`
  SELECT json_object_agg('foo', ${BigInt(9_007_199_254_740_999n)}::bigint::text)
`);

The above is perfectly fine.

My thinking is that this way the handling of the value does not depend on Slonik and it is less likely to produce unexpected results.

The error message could include suggestions of how to cast the value.

AndrewJo commented 12 months ago

I think throwing an error may be a valid behavior because BigInt introduces a footgun that will take many devs by surprise if they're not keenly aware of the implementation details and differences of JavaScript's BigInt vs PostgreSQL's bigint. That being said, I think it should be well documented and Slonik should provide escape hatch for advanced developers who know what they're getting themselves into (i.e. should be an opt-in functionality) instead of blocking them from doing their work.

Just to highlight how error-prone handling BigInt can be, I need to remind myself that BigInt type in JavaScript is very different to bigint data type in PostgreSQL even though they have pretty much identical type names.

In PostgreSQL bigint is just a type alias for int8, or just a signed 8-byte/64-bit integer which has an upper bound and isn't arbitrarily infinite. In this case, it's 263 - 1 = 9,223,372,036,854,775,807.

However, BigInt in JavaScript theoretically has no upper bound (practically, Chrome defines a limit of billion bits, so we can infer that Node.js' V8 engine also has the same limitation). You can make a number arbitrarily large past the upper bound of a 64-bit signed integer. BigInt does not prevent you from creating a number that's many orders of magnitude higher such as 170,141,183,460,469,231,731,687,303,715,884,105,727 (2127 - 1) which is an integer with 39 digits.

Reading 64-bit integer out of PostgreSQL as BigInt is not a big deal but you can very easily overflow PostgreSQL's bigint data type by binding BigInt value that's larger than 263 - 1.

Therefore, the most analogous data type in PostgreSQL is not actually bigint, but rather numeric type. numeric is one of "user-defined precision" types that lets you specify up to 131,072 digits before the decimal point and up to 16,383 digits after the decimal point. I'm pretty sure for most intents and purposes, this is "unlimited upper bound". (Given that number of atoms in the observable universe is about 1082, or 83 digits before the decimal point.)


Basic rule of thumb should be:

  1. Use BigInt if selecting existing data out of PostgreSQL table with bigint column and there's no value binding anywhere:

    const schema = z.object({ id: z.bigint(), name: z.string() });
    const foo = pool.many(sql.type(schema)`SELECT id, name FROM foo`);
  2. Binding BigInt values should be done with caution:

    1. If binding into a query that contains bigint/int8 data type, some input validation is needed to make sure the value will always be less than or equal to 263 - 1. (e.g. z.bigint().lte(9_223_372_036_854_775_807n))
    2. Otherwise, type cast to ::numeric so that you don't run into integer overflows that's pretty much only detectable at runtime AFTER you deployed your code to production.

      For example, given this query:

      const biggerThan64bit = 9223372036854775808n;
      const schema = z.object({ biggerThan64bit: z.bigint() });
      const num = await pool.oneFirst(sql.type(schema)`SELECT ${biggerThan64bit}::bigint AS biggerThan64bit`);

      You'll get a query error like this:

      Query 1 ERROR: ERROR:  bigint out of range

      This query is perfectly valid:

      const biggerThan64bit = 9223372036854775808n;
      const schema = z.object({ biggerThan64bit: z.bigint() });
      const num = await pool.oneFirst(sql.type(schema)`SELECT ${biggerThan64bit}::numeric AS biggerThan64bit`);
AndrewJo commented 12 months ago

In raw psql client, PostgreSQL seems to take Option 2 and serialize into plain number for JSON:

SELECT json_build_object('biggerThan64bit', '9223372036854775809'::numeric) AS big_json;
-- Returns {"biggerThan64bit": 9223372036854775809}

So at least PL/pgSQL dialect seems to treat that as a valid JSON fwiw, regardless of whether this causes issues for node-postgres driver.

Executing in TablePlus:

image

gajus commented 12 months ago

Very interesting. Learned a lot! Thank you

AndrewJo commented 12 months ago

You're welcome!

PS: Also forgot to mention numeric/decimal trades off performance for precision quite drastically so it should be used very sparingly for business use cases that require absolute precision (e.g. banking, finance transactions, etc.)

For most use cases, 53 bits of integer precision should be enough. Even if you use them for primary key id fields, it gives you over 9 quadrillion unique rows before you have to migrate to something else like GUIDs (in which case, either your business is a unicorn or your engineering team is doing something very wrong). For storing datetime as UNIX epoch, even at millisecond granularity, 53-bit integer can describe dates 232,000 years into the future (highly doubt humanity, let alone software written in JavaScript will survive that long).

All this is to say, engineers should think before they start introducing BigInt everywhere in the codebase because it should be a deliberate, educated decision after learning about the trade-offs.