fabian-hiller / valibot

The modular and type safe schema library for validating structural data 🤖
https://valibot.dev
MIT License
5.6k stars 169 forks source link

Minimum Viable Schema Protocol #679

Open jamiebuilds opened 1 week ago

jamiebuilds commented 1 week ago

TL;DR

I want to propose a "minimum viable protocol" for libraries to accept schemas as part of their API without depending directly on Valibot (or another library such as Zod)

// Protocol:
export let parseSymbol: unique symbol = Symbol.for("https://github.com/fabian-hiller/symbol-parse")
/** @throws {unknown} If the input is rejected. */
export type Parse<T> = (input: unknown) => T
export type Schema<T> = { [parseSymbol]: Parse<T> }

// Example library:
export async function fetchJson<T>(endpoint: string, schema: Schema<T>): Promise<T> {
  let response = await fetch(endpoint)
  let json = await response.json()
  try {
    // Call `parse()` on some unknown data and get the schema's output type
    return schema[parseSymbol](json)
  } catch (error: unknown) {
    // Catch whatever error was thrown by the schema
    throw new Error("Unexpected data", { cause: error })
  }
}

Background

I'm building a library that accept a Valibot schema's as part of its API.

import { sql } from "lib"
import * as v from "valibot"

function getUserAge(userId: string): number {
  return sql`select age from user where id = ${userId}`
    .pluck(v.number())
    .one()
}

For the purposes of this library, I don't really need to do much of anything with Valibot itself. I'm not creating my own schemas and I just want to give users nice typings and convenient APIs when using their own schemas.

To make this work, I just need to be able to call Valibot's parse(schema, input) method.

Problem

It would be nice if I didn't have to pull in Valibot as a dependency just for its most basic functionality. It would then mean I also have to keep it up to date for major versions.

It would also be nice if I could just say that I accept several different schema libraries.

Solution

If Valibot could commit to supporting (across major versions of Valibot) a well-specified, minimum viable protocol similar to that of Promises .then, Iterators [Symbol.iterator], or Observables [Symbol.observable], then library authors could depend just on the protocol and not need to pull in Valibot as a dependency.

It would also make these libraries generic across any other schema library that wanted to implement the protocol.

There could be a package similar to https://github.com/benlesh/symbol-observable that just exposes a shared symbol:

export let parseSymbol: unique symbol = Symbol.for("https://github.com/fabian-hiller/symbol-parse")
/** @throws {unknown} If the input is rejected. */
export type Parse<T> = (input: unknown) => T
export type Schema<T> = { [parseSymbol]: Parse<T> }

Then in libraries they can just make use of it in their types:

import { parseSymbol, Schema } from "symbol-parse"

function parse<T>(schema: Schema<T>, input: unknown): T {
  return schema[parseSymbol](input)
}

Why "minimum viable" protocol?

Asking lots of developers to coordinate around a shared protocol can be like herding cats. A larger footprint becomes harder to come to consensus on and asks more of everyone who interacts with it.

For this purpose I suggest:

fabian-hiller commented 1 week ago

In general, I welcome such ideas. However, the main problem for Valibot will be that following a protocol will unnecessarily increase the bundle size for all users not using that protocol.

As a workaround for now. Valibot provides a parser and safeParser method that returns a function. If your users wrap their schemas in parser or safeParser, you can run them directly without adding Valibot as a dependency. Another workaround that many form libraries use is adapters and resolvers.

import * as v from 'valibot';

const parseNumber = v.parser(v.number());

const number1 = parseNumber(123); // returns 123
const number2 = parseNumber('123'); // throws error

I suggest you also have a look at TypeSchema, Standard Schema and this discussion.

jamiebuilds commented 5 days ago

I would be surprised if this had much of a real world impact, it's not like Valibot gets used without parse()/safeParse() (which are called by parser()/safeParser()) anyways which already includes the code for global config and such. If you abstracted it away to a function call in all of the schema factory functions, it's not much more code than the parse functions that every user has to include anyways.

export function string() {
  return defineSchema({ ... })
}
// or
export let string = defineSchema(() => {
  // ...
})

You honestly might want to do something like that with these factories anyways because you could introduce some caching which could improve the performance of Valibot a lot.

fabian-hiller commented 5 days ago

It makes a difference because parse needs to import ValiError and if people only use safeParse this code will never be used. It is true that the real world impact may be small, but it still feels wrong to me because it goes against the philosophy of our API design and implementation. If all the other libraries follow such a specification, Valibot will probably follow too, but Valibot is the wrong library to start such an initiative.

You honestly might want to do something like that with these factories anyways because you could introduce some caching which could improve the performance of Valibot a lot.

Create idea! I will investigate this as part of #572.

jamiebuilds commented 4 days ago

If such a proposal instead was an equivalent of safeParse() and specified a return value of:

type Issue = { path?: PropertyKey[], message: string }
type Result<T> = 
  | { ok: true, result: T, issues: void }
  | { ok: false, result: void, issues: Issue[] }

export let parseSymbol: unique symbol = Symbol.for("parse")
/** @throws {unknown} If the input is rejected. */
export type Parse<T> = (input: unknown) => Result<T>
export type Schema<T> = { [parseSymbol]: Parse<T> }

Would that be more acceptable?

fabian-hiller commented 4 days ago

Yes, but supporting this format will probably add additional code besides safeParse, which will increase the size of the bundle even more. I support your idea, and we should discuss it as a community, but Valibot will probably adopt it later than other libraries due to our focus on bundle size and modularity.

jamiebuilds commented 4 days ago

Yeah, I'm discussing it with you now to understand what version of this protocol you'd accept. It's possible that it could include things that reduce your implementation even further:

Together this would mean the implementation of this spec is no more than:

let decorate = schema => {
  schema[parseSymbol] = input => {
    return schema._run(input, getGlobalConfig())
  }
}

// or minified (maybe code-golf-able further in context)

let d=s=>s[p]=i=>s._run(i,c())

Or going even further, if you wanted to be able to just replace _run with a symbol (or whatever the spec wants to use for its property:

That would make this the full change in Valibot:

  export function number(
    message?: ErrorMessage<NumberIssue>
  ): NumberSchema<ErrorMessage<NumberIssue> | undefined> {
    return {
      kind: 'schema',
      type: 'number',
      reference: number,
      expects: 'number',
      async: false,
      message,
-     _run(dataset, config) {
+     [parseSymbol](dataset, config = getConfig()) {
        if (typeof dataset.value === 'number' && !isNaN(dataset.value)) {
          dataset.typed = true;
        } else {
          _addIssue(this, 'type', dataset, config);
        }
        return dataset as Dataset<number, NumberIssue>;
      },
    };
  }

Which minified is a negligible amount of bytes other than getConfig() which always has to be used in Valibot since its included in all of the parse methods.

jamiebuilds commented 4 days ago

Wrote a proposal over here: https://github.com/standard-schema/standard-schema/issues/3

fabian-hiller commented 4 days ago

Thank you! The [parseSymbol](dataset, config = getConfig()) ... change looks much better for Valibot. One thing to note is that dataset contains more info then the raw input. So I am not sure if this will work the same way with other schema libraries.

Wrote a proposal over here: https://github.com/standard-schema/standard-schema/issues/3

Thanks! I will have a look at it.