gajus / slonik

A Node.js PostgreSQL client with runtime and build time type safety, and composable SQL.
Other
4.56k stars 138 forks source link

feat: add @slonik/lazy-dataloader #596

Open gajus opened 5 months ago

gajus commented 5 months ago

Lazy DataLoader

Connection pool wrapper with seamless query batching.

Usage

import { createLazyDataLoader } from '@slonik/lazy-dataloader';
import {
  createPool,
  sql,
} from 'slonik';
import { z } from 'zod';

const pool = createPool('postgres://');

const lazyDataLoader = createLazyDataLoader(pool);

const results = await Promise.all([
  lazyDataLoader.oneFirst(
    sql.type(
      z.object({
        id: z.number(),
        name: z.string(),
      })
    )`
      SELECT id, name
      FROM person
      WHERE id = ${1}
    `
  ),
  lazyDataLoader.oneFirst(
    sql.type(
      z.object({
        id: z.number(),
        name: z.string(),
        website: z.string().nullable(),
      })
    )`
      SELECT id, name, website
      FROM company
      WHERE id = ${2}
    `
  ),
]);

console.log(results);

In this example:

How it works

Using the same idea as DataLoader, LazyDataLoader will batch all queries that are executed in the same tick. This is done by using sub-queries for every query. Example:

SELECT
  (
    SELECT json_agg(row_to_json(t))
    FROM (
      SELECT id, name
      FROM person
      WHERE id = 1
    ) t
  ) query_1,
  (
    SELECT json_agg(row_to_json(t))
    FROM (
      SELECT id, name, website
      FROM company
      WHERE id = 2
    ) t
  ) query_2

Use cases

This is experimental approach to help with the N+1 problem that is common in GraphQL APIs.

The same problem can be solved more efficiently by using a DataLoader directly and hand crafting the queries. The latter approach is more flexible and efficient, but requires more work. In our example, it would require crafting two separate loaders and invoking them explicitly. Meanwhile, this library is a middle ground that can be used in some cases to reduce the impact of the N+1 problem by reducing the number of round trips to the database.

Considerations

I have two primary concerns with this approach:

  1. Queries batched this way are never going to be as efficient as hand crafted data loaders
  2. This makes monitoring individual query performance near impossible

Regarding the first point, it is conceptually the difference between:

SELECT id, name FROM person WHERE id IN (N+1)

and a union equivalent to:

SELECT id, name FROM person WHERE id = 1
SELECT id, name FROM person WHERE id = 2
// ... N+1

The latter is still better than just doing a roundtrip for every query, but the former would be a lot more efficient.

Regarding the second point, because every query is going to be a unique batch of queries, it is going to be difficult to get query-level performance insights from the tools that we currently rely on.

changeset-bot[bot] commented 5 months ago

⚠️ No Changeset found

Latest commit: c1239224b2ad99b8c35b10d64e81abbfca5d6ae8

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR