fabian-hiller / valibot

The modular and type safe schema library for validating structural data πŸ€–
https://valibot.dev
MIT License
6.39k stars 205 forks source link

Support Metadata like description #373

Closed fabian-hiller closed 2 months ago

fabian-hiller commented 10 months ago

Discussed in https://github.com/fabian-hiller/valibot/discussions/368

Originally posted by **xcfox** January 14, 2024 Valibot is a super cool Schema Builder! I am attracted by Valibot's modular design. I plan to use Valibot as an Entity Schema Builder for [MikroORM](https://github.com/mikro-orm/mikro-orm). So far, I am using `TypeScript`'s `class` and `Decorators` to define MikroORM's [Entity](https://mikro-orm.io/docs/defining-entities) like this: ```typescript @Entity() export class BaseEntity { @PrimaryKey() id!: number @Property() createdAt = new Date() @Property({ onUpdate: () => new Date() }) updatedAt = new Date() } interface WithName { name: string } @Entity() export class User extends BaseEntity implements WithName { @Property({ unique: true, comment: "user's name" }) name!: string @index() @Property({ nullable: true }) email: string @Property({ type: 'text' }) password!: string } ``` Because TypeScript's `class` doesn't support multiple inheritance, I have to define the `name` field repeatedly. Maybe I can use Valibot and custom pipelines to solve this problem: ```typescript import { intersect, Output, nullable, date, string, object } from 'valibot' import { primaryKey, property, index, unique, description } from './custom' const baseEntity = object({ id: string([primaryKey()]), createdAt: date([property()]), updatedAt: date([property()]), }) const withName = object({ name: string([property(), index(), description("user's name")]), }) export const userEntity = intersect([ object({ email: nullable(string([property(), unique()])), password: string([property()]), }), baseEntity, withName, ]) export interface IUser extends Output {} ``` Here I use many custom pipelines: `primaryKey`, `property`, `index`, `unique`, `description`. However, the current version of Valibot's pipeline only supports `Validation` and `Transformation`. I don't think these custom pipes belong to either `Validation` or `Transformation`. Once `Metadata` is supported, Valibot has the ability to shine in all scenarios where a Schema needs to be defined, such as [graphql](https://typegraphql.com/docs/types-and-fields.html), [typegoose](https://typegoose.github.io/typegoose/docs/guides/quick-start-guide), [typeorm](https://typeorm.io/entities#what-is-entity) and not just for data validation like `zod`. Valibot's modularity, combined with `Metadata`, makes for a very powerful Schema Builder! All projects using TypeScript can use Valibot to define Schemas, and the development experience will be far superior to that of `class` and `Decorators`.
fabian-hiller commented 10 months ago

I have investigated this use case. As far as I understand my code, the current implementation of Valibot does not limit this functionality. This feature can already be added by third party libraries by providing pipeline actions like index and description. Here is an example that implements index and description:

import * as v from 'valibot';

/**
 * Index metadata type.
 */
type IndexMetadata<TInput> = {
  async: false;
  _parse(input: TInput): v.PipeActionResult<TInput>;
  type: 'index';
};

/**
 * Creates a pipeline metadata action to indicate that the field should be indexed.
 *
 * @returns A metadata action.
 */
function index<TInput>(): IndexMetadata<TInput> {
  return {
    type: 'index',
    async: false,
    _parse(input) {
      return v.actionOutput(input);
    },
  };
}

/**
 * Description metadata type.
 */
type DescriptionMetadata<TInput> = {
  async: false;
  _parse(input: TInput): v.PipeActionResult<TInput>;
  type: 'description';
  description: string;
};

/**
 * Creates a pipeline metadata action that adds a description the field.
 *
 * @param description The description to be added.
 *
 * @returns A metadata action.
 */
function description<TInput>(description: string): DescriptionMetadata<TInput> {
  return {
    type: 'description',
    description,
    async: false,
    _parse(input) {
      return v.actionOutput(input);
    },
  };
}

// Create user entity schema
const UserEntitySchema = v.object({
  name: v.string([
    index(),
    description('The name of the user.'),
    v.maxLength(30),
  ]),
  // ...
});

// Read index metadata
const shouldBeIndexed = !!UserEntitySchema.entries.name.pipe?.find(
  (action) => 'type' in action && action.type === 'index'
);

The more important thing for us as a community is to decide if this functionality belongs in the scope of Valibot, and if the library should get a first-class metadata API. If we decide to do so, we should discuss if we want to add this to our current pipeline feature as shown above, if we want to rename and reimplement the pipeline feature so that it is not necessary to include async and _parse for metadata objects, or if we want to implement this completely different, for example with a metadata method that adds the metadata directly to the schema object apart from the validation and transformation actions.

I look forward to hearing what you think.

xcfox commented 10 months ago

@fabian-hiller You're absolutely right.

Over the past few days, I've built the valibot-mikro library for building mikro entities with the help of valibot. What I'm doing now is disguising metadata like property(), index() as BaseValidation.

/**
 * Mikro property meta.
 */
export type PropertyMeta<Entity = any, TInput = any> = BaseValidation<TInput> & {
    /**
     * The validation type.
     */
    type: "mikro_property"

    /**
     * The property meta.
     */
    meta?: Partial<EntitySchemaProperty<TInput, Entity>>
}

This works fine, but it feels a little twisted. As you say, this contains unnecessary async and _parse.

I strongly agree with rename and reimplement the pipeline feature to support pure metadata. In addition, actually, in valibot-mikro, I've implemented some methods similar to the metadata method, that is, manyToMany, manyToOne, which are actually relation types specific to MikroORM.

In my practice, I realized that there should be no need for an abstract metadata method method, but many more explicit metadata methods, such as property, index, columnName. By putting the metadata in the pipeline, we can clearly identify the type of the field, while the secondary metadata is in the secondary position:

const User = object({
  name: number([property(), index(), columnName('a_strange_column_name')])
})

But by using metadata as a wrapper, the code starts to get confused, metadata() is not a type, but it is in the place of the type:

const User = object({
  name: metadata(number(), [property(), index(), columnName('a_strange_column_name')])
})

Such an approach may could be even more messy:

const User = object({
  name: columnName(property(index(number())), 'a_strange_column_name')
})

As a community, could we also consider building in some generic metadata like description() so that when writing the schema, we don't have to write generic meta information for every 3rd party library. I see that the describe() method exists in zod.

xcfox commented 10 months ago

I also noticed that not every schema can accept pipe arguments, such as recursive(), enum_(), picklist().
I understand that these schemas include validation by themselves and may not need an additional pipe, but when I use valibot as a schema builder, I do need to provide a way for all schemas to pass in a metadata.

Do you use a plan to complement the pipe parameter for all schemas?
Or can we add additional meta attributes to the schema? For example, for BaseSchema:

/**
 * Base schema type.
 */
export type BaseSchema<TInput = any, TOutput = TInput> = {
  /**
   * store the metadata.
   */
  meta?: BaseMetadata[];
  /**
   * Whether it's async.
   */
  async: false;
  /**
   * Parses unknown input based on its schema.
   *
   * @param input The input to be parsed.
   * @param info The parse info.
   *
   * @returns The parse result.
   *
   * @internal
   */
  _parse(input: unknown, info?: ParseInfo): SchemaResult<TOutput>;
  /**
   * Input and output type.
   *
   * @internal
   */
  _types?: { input: TInput; output: TOutput };
};

The advantage of this is that the metadate can be completely ignored when running validation for better performance.

fabian-hiller commented 10 months ago

I like the idea of putting the metadata under its own key. However, this would require that we put the metadata in a different array to separate from the pipe, or we would have to filter and process the pipeline when creating the schema to extract the metadata. Also, I think it would be nice to access the metadata directly via .metadata?.description. Instead of writing .metadata?.find((item) => item.type === 'description').

const User = object({
  // Note: This would require us to always specify the pipeline, even if it is empty
  name: string([], [index(), description('Lorem ipsum')]),
  age: number([minValue(10)], [index(), description('Lorem ipsum')]),
})

Another idea is to provide the metadata directly as an object instead of an array of functions. I suspect this will also result in the smallest bundle size.

const User = object({
  name: string([], { index: true, description: 'Lorem ipsum' }),
  age: number([minValue(10)], { index: true, description: 'Lorem ipsum' }),
})
xcfox commented 10 months ago

I totally agree with the use of key-value pairs to store metadata. For the best development experience, I think we should put the metadata in the same array as the validators and transformations:

const User = object({
  name: string([index(), description('Lorem ipsum')]),
  age: number([minValue(10), index(), description('Lorem ipsum')]),
})

To do this we need to do some extra work when creating the schema, which fortunately is fairly simple.

Also, I noticed that ErrorMessage should actually be treated as metadata, so that we don't have to leave a separate pass position for message, and for all schemas we can use rest parameters to pass the pipeline so that we don't have to explicitly build arrays:

const User = object({
  name: string(message("The name is illegal"), index(), description('Lorem ipsum')),
  age: number(minValue(10), index(), description('Lorem ipsum')),
})

This will result in a huge break change. Another idea is to use the current location of message to accept metadata, which should accept a string or metadata object for compatibility with older code:

const User = object({
  name: string({ description:"Lorem ipsum", index: true }),
  age: number({ message:"too young", description:"Lorem ipsum", index: true },[minValue(10)]),
})

Taking into account compatibility, development experience and packaging size, I think this is a very good solution. We also need to expose the Metadata interface externally for code hinting and type checking:

// valibot
export interface Metadata {
  description?: string
  message?: string
}

Users need to add additional global type declarations when installing third-party libraries:

// env.d.ts
import { Metadata } from 'valibot'
import { MikroMetadata } from 'valibot-mikro'
declare module 'valibot' {
  export interface Metadata extends MikroMetadata {}
}
fabian-hiller commented 10 months ago

Thanks again for the details! I will look into this and get back to you with the results.

fabian-hiller commented 10 months ago

For the best development experience, I think we should put the metadata in the same array as the validators and transformations

I agree that the DX is very nice this way. The downside for me is the internal implementation because it requires us to filter and process the pipeline on schema creation. This would increase the bundle size and slow down the startup performance. Since I expect that the metadata feature will only be used by a small fraction of users, I am not sure if I want to go this way.

Also, I noticed that ErrorMessage should actually be treated as metadata

I see your point. On the other hand, ErrorMessage is used as the first optional argument of every schema and validation functions and validation functions have no pipeline. The current approach makes it consistent across the library.

and for all schemas we can use rest parameters to pass the pipeline so that we don't have to explicitly build arrays

When I first implemented Valibot in July 2023, I considered this approach. In the end I decided against it because it makes it much harder to distinguish the pipe arguments and limits the API for complex schema functions like object that have other optional arguments. Also, the formatting with Prettier looks a lot more chaotic this way.

Another idea is to use the current location of message to accept metadata

This might work, and I will consider it. It might make it harder to handle and process custom error messages, which is an important feature of the library. Also, I am not sure if message should be part of the metadata.

We also need to expose the Metadata interface externally for code hinting and type checking

Great idea! I agree!


Although I have argued against your suggestions, this comment is not a final decision, and I welcome your feedback. In the end, my goal is to work with the community to create a great schema library.

If we decide to add the metadata object as the last optional argument of each schema function, I see several benefits.

const User = object({
  name: string({ index: true, description: 'Lorem ipsum' }),
  age: number([minValue(10)], { index: true, description: 'Lorem ipsum' }),
})
xcfox commented 10 months ago

I strongly agree with adding the metadata object as the last optional argument of each schema function.
This approach is not much different on DX compared to using the current location of message to accept metadata.
The only problem is that this leads to more verbose signatures for schema functions, such as for object() functions:

Click to show code ```TypeScript /** * Creates an object schema. * * @param entries The object entries. * @param metadata The schema metadata. * * @returns An object schema. */ export function object( entries: TEntries, metadata?: Metadata ): ObjectSchema; /** * Creates an object schema. * * @param entries The object entries. * @param pipe A validation and transformation pipe. * @param metadata The schema metadata. * * @returns An object schema. */ export function object( entries: TEntries, pipe?: Pipe>, metadata?: Metadata ): ObjectSchema; /** * Creates an object schema. * * @param entries The object entries. * @param message The error message. * @param pipe A validation and transformation pipe. * @param metadata The schema metadata. * * @returns An object schema. */ export function object( entries: TEntries, message?: ErrorMessage, pipe?: Pipe>, metadata?: Metadata ): ObjectSchema; /** * Creates an object schema. * * @param entries The object entries. * @param rest The object rest. * @param pipe A validation and transformation pipe. * @param metadata The schema metadata. * * @returns An object schema. */ export function object< TEntries extends ObjectEntries, TRest extends BaseSchema | undefined >( entries: TEntries, rest: TRest, pipe?: Pipe>, metadata?: Metadata ): ObjectSchema; /** * Creates an object schema. * * @param entries The object entries. * @param rest The object rest. * @param message The error message. * @param pipe A validation and transformation pipe. * @param metadata The schema metadata. * * @returns An object schema. */ export function object< TEntries extends ObjectEntries, TRest extends BaseSchema | undefined >( entries: TEntries, rest: TRest, message?: ErrorMessage, pipe?: Pipe>, metadata?: Metadata ): ObjectSchema; export function object< TEntries extends ObjectEntries, TRest extends BaseSchema | undefined = undefined >( entries: TEntries, arg2?: Metadata | Pipe> | ErrorMessage | TRest, arg3?: Metadata | Pipe> | ErrorMessage, arg4?: Metadata | Pipe>, arg5?: Metadata ): ObjectSchema ```

This looks like it will come with more maintenance costs, other than that the solution is perfect!

fabian-hiller commented 10 months ago

Which metadata properties should Valibot ship by default? Can you create a list for me?

xcfox commented 10 months ago

We should refer to existing popular standards such as annotations for json-schema, parameter for openapi, GraphQL or ts-doc. I researched the above specification and came up with some generic metadata fields:

/**
 * Schema metadata type.
 */
export interface SchemaMetadata<T = any> {
  /**
   * The name of the schema.
   */
  name?: string;
  /**
   * A brief description of the schema.
   */
  description?: string;
  /**
   *  The instance value of the schema should not be used and the schema may be removed in the future.
   */
  deprecated?: boolean;

  /**
   * The `examples` is a place to provide an array of examples that validate against the schema.
   */
  examples?: T[];
}

Of course, we can also strictly follow the json-schema specification.

Then we will have the following build-in metadata fields:


/**
 * Schema metadata type.
 */
export interface SchemaMetadata<T = any> {
  /**
   * The title of the schema.
   */
  title?: string;
  /**
   * A brief description of the schema.
   */
  description?: string;

  /**
   * The `examples` is a place to provide an array of examples that validate against the schema.
   */
  examples?: T[];

  /**
   * The `readOnly` keyword indicates that the value of the instance is managed exclusively by the owning authority, and attempts by an application to modify the value of this property are expected to be ignored or rejected by that owning authority.
   */
  readOnly?: boolean;

  /**
   * The `writeOnly` keyword indicates that the value is never present when the instance is retrieved from the owning authority.
   */
  writeOnly?: boolean;

  /**
   *  The instance value of the schema should not be used and the schema may be removed in the future.
   */
  deprecated?: boolean;
}
fabian-hiller commented 10 months ago

What about SQL properties like index, unique and primary key?

xcfox commented 10 months ago

I don't think SQL properties should be included in the valibot package itself. This is because when dealing with a specific business, the situation is complex and varied. In the TypeScript world, each ORM has its own unique way of defining schema:

https://sequelize.readthedocs.io/ https://orm.drizzle.team/docs/sql-schema-declaration https://typeorm.io/entities#what-is-entity https://typegoose.github.io/typegoose/docs/guides/quick-start-guide/#quick-overview-of-typegoose

It is quite difficult to adapt valibot to such a variety of situations or to follow ORM package updates.

In order for valibot to be a generalized schema builder, I think it needs to be used to create various community packages such as valibot-drizzle, valibot-typeorm.
And valibot itself should remain modular and extensible so that developers can easily add unique metadata:

// env.d.ts
import { SchemaMetadata } from 'valibot'
import { MikroMetadata } from 'valibot-mikro'
declare module 'valibot' {
  export interface SchemaMetadata extends MikroMetadata {}
}
xcfox commented 10 months ago

I'm trying to implement this feature, and the current solution is to use metadata as the last parameter of the schema function. But now I'm facing a tricky problem. TypeScript doesn't seem to hit the overload correctly.

Here is the array() declaration ```TypeScript /** * Creates a array schema. * * @param item The item schema. * @param pipe A validation and transformation pipe. * @param metadata The schema metadata. * * @returns A array schema. */ export function array( item: TItem, pipe?: Pipe[]>, metadata?: SchemaMetadata> ): ArraySchema; /** * Creates a array schema. * * @param item The item schema. * @param message The error message. * @param pipe A validation and transformation pipe. * @param metadata The schema metadata. * * @returns A array schema. */ export function array( item: TItem, message?: ErrorMessage, pipe?: Pipe[]>, metadata?: SchemaMetadata> ): ArraySchema; /** * Creates a array schema. * * @param item The item schema. * @param metadata The schema metadata. * * @returns A array schema. */ export function array( item: TItem, metadata?: SchemaMetadata> ): ArraySchema; export function array( item: TItem, arg2?: SchemaMetadata> | Pipe[]> | ErrorMessage, arg3?: SchemaMetadata> | Pipe[]>, arg4?: SchemaMetadata> ): ArraySchema ```
const schema2 = array(number(), 'Error', [length(1), includes(123)]);

For the above code, TypeScript, gives the following error message:

No overload matches this call.
  Overload 1 of 3, '(item: NumberSchema<number>, pipe?: Pipe<number[]> | undefined, metadata?: SchemaMetadata<number> | undefined): ArraySchema<NumberSchema<number>, number[]>', gave the following error.
    Argument of type 'string' is not assignable to parameter of type '(BaseValidation<number[]> | BaseTransformation<number[]>)[]'.ts(2769)

But the following code works fine:

const schema2 = array(number(), 'Error', [length(1), includes(123)], {});
const schema3 = array(number(), 'Error', [length(1), includes(123)], undefined);

If you have a spare moment, check out my code implementation here

To avoid the above, I think we should use the current ErrorMessage location for the metadata. This location should accept both ErrorMessage and SchemaMetadata, so that break change can be avoided.
We also reserve a message field in metadata for cases where both ErrorMessage and SchemaMetadata are needed. We can either extract the message in defaultArgs(), or we can just make the message part of the metadata, either way it's easy.

This approach also makes our schema function much simpler:

Here we only need two overloads ```TypeScript /** * Creates a array schema. * * @param item The item schema. * @param pipe A validation and transformation pipe. * * @returns A array schema. */ export function array( item: TItem, pipe?: Pipe[]> ): ArraySchema; /** * Creates a array schema. * * @param item The item schema. * @param metadata The schema metadata or error message. * @param pipe A validation and transformation pipe. * * @returns A array schema. */ export function array( item: TItem, metadata?: ErrorMessage | SchemaMetadata>, pipe?: Pipe[]> ): ArraySchema; export function array( item: TItem, arg2?: SchemaMetadata> | ErrorMessage | Pipe[]>, arg3?: Pipe[]> ): ArraySchema ```
const schema2 = array(number(), 'Error', [length(1), includes(123)]);
const schema3 = array(number(), { message: 'Error', description: "Lorem ipsum" }, [length(1), includes(123)]);

Let's compare the scenarios in detail: that is, placing metadata at the last parameter and placing metadata at the ErrorMessage position:

point at the last parameter at the ErrorMessage position
no breaking changes ⚠️ ts may miss the expected function overload βœ…
easily identified in defaultArgs βœ… βœ…
can be added to the schema object without processing βœ… ⚠️ Requires additional operations on the message
easily accessible as key-value pairs via Schema.metadata βœ… βœ…
Small packing size βœ… βœ…
code complexity ⚠️ complexity function overload βœ… similar to the current
performance βœ… slightly more complex defaultArgs βœ… slightly more complex defaultArgs
fabian-hiller commented 10 months ago

We should refer to existing popular standards such as annotations for ...

I would not add properties like examples or deprecated. What use do you see for them? For example, JS Doc can be used as a comment to add such informations.

I don't think SQL properties should be included in the valibot package itself.

Weren't the SQL properties the main reason for this feature? Are the properties not standardized? I thought that a Valibot schema could then be sufficient to generate SQL commands.

In general: Where and how do you plan to use this metadata feature?

βœ… slightly more complex defaultArgs (at the ErrorMessage position)

How would you implement defaultArgs in this case?

xcfox commented 10 months ago

Weren't the SQL properties the main reason for this feature? Are the properties not standardized?

Yes, in fact, there is no standard. In addition, valibot schema does not contain sufficient information to generate table-building statements SQL. Let's look at a few examples:

  1. In valibot, there is a number() schema,

    • In MySQL, there are INT, SMALLINT, BIGINT, TINYINT, FLOAT, DOUBLE, DECIMAL types.

    • In PostgreSQL, there are INT, SMALLINT, BIGINT, DECIMAL, DOUBLE PRECISION, NUMERIC, MONEY types.

  2. When we face the same SQL statement to build a table, there are many differences in the API design of different orm:

    CREATE TABLE "Post" (
     "id" SERIAL,
     "createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
     "updatedAt" TIMESTAMP(3) NOT NULL,
     "title" VARCHAR(255) NOT NULL,
     "content" TEXT,
     "published" BOOLEAN NOT NULL DEFAULT false,
     "authorId" INTEGER NOT NULL,
     PRIMARY KEY ("id")
    );
    TypeORM Declaration Code ```TypeScript import { Entity, PrimaryGeneratedColumn, Column, CreateDateColumn, UpdateDateColumn, } from 'typeorm'; @Entity('Post') export class Post { @PrimaryGeneratedColumn() id: number; @CreateDateColumn({ type: 'timestamp', default: () => 'CURRENT_TIMESTAMP(3)' }) createdAt: Date; @UpdateDateColumn({ type: 'timestamp' }) updatedAt: Date; @Column({ type: 'varchar', length: 255 }) title: string; @Column({ type: 'text', nullable: true }) content: string; @Column({ type: 'boolean', default: false }) published: boolean; @Column({ type: 'int' }) authorId: number; } ```
    MikroORM Declaration Code ```TypeScript import { Entity, PrimaryKey, Property, SerializedPrimaryKey, } from '@mikro-orm/core'; @Entity() export class Post { @PrimaryKey() id: number; @Property({ type: Date, onCreate: () => new Date() }) createdAt: Date; @Property({ type: Date, onUpdate: () => new Date() }) updatedAt: Date; @Property() title: string; @Property({ nullable: true }) content: string; @Property({ default: false }) published: boolean; @Property() authorId: number; } ```

In general: Where and how do you plan to use this metadata feature?

I want to use it as the alternative of reflect-metadata

There are lots of problems when declaring schema in TypeScript's class with reflect-metadata:

Specifically, I have two use cases:

The whole community has been trying to move away from reflect-metadata for a few years now.
In the SQL scenario, emerging ORMs are starting to use their own schema syntax:

This is especially true in the world of GraphQL: pothos, nexus, gqtx, grats.

As you can see, there are so many schema builders in the community. The problem is that the current schema builder ecosystem is separate and not as universal as class and reflect-metadata.

I hope that valibot will become the universal schema builder in the TypeScript world. Once I declared mikro's entity with valibot, no extra work was needed to get json-schema, openapi graphql's ObjectType, Parameters for command line applications.

xcfox commented 10 months ago

How would you implement defaultArgs in this case?

/**
 * Returns message and pipe from dynamic arguments.
 *
 * @param arg1 First argument.
 * @param arg2 Second argument.
 *
 * @returns The default arguments.
 */
export function defaultArgs<TPipe extends Pipe<any> | PipeAsync<any>>(
  arg1: ErrorMessage | SchemaMetadata | TPipe | undefined,
  arg2: TPipe | undefined
): [ErrorMessage | undefined, TPipe | undefined, SchemaMetadata | undefined] {
  if (Array.isArray(arg1)) return [undefined, arg1, undefined];

  if (typeof arg1 === 'string' || typeof arg1 === 'function')
    return [arg1, arg2, undefined];

  return [arg1?.message, arg2, arg1];
}

Compare this to the implementation when I use metadata as the last parameter:

/**
 * Returns message and pipe from dynamic arguments.
 *
 * @param args The arguments.
 *
 * @returns The default arguments.
 */
export function defaultArgs<TPipe extends Pipe<any> | PipeAsync<any>>(
  ...args: (SchemaMetadata | TPipe | ErrorMessage | undefined)[]
): [ErrorMessage | undefined, TPipe | undefined, SchemaMetadata | undefined] {
  let message: ErrorMessage | undefined;
  let pipe: TPipe | undefined;
  let metadata: SchemaMetadata | undefined;

  for (const arg of args) {
    if (typeof arg === 'string' || typeof arg === 'function') {
      message = arg;
    } else if (Array.isArray(arg)) {
      pipe = arg;
    } else if (typeof arg === 'object') {
      metadata = arg;
    }
  }

  return [message, pipe, metadata];
}
fabian-hiller commented 10 months ago

Thank you very much! I see a problem. The object and tuple schema functions accept a schema as the first optional argument. Since a schema is represented as an object, we can not distinguish it from the metadata object.

fabian-hiller commented 9 months ago

Hi, I may change the implementation of object and tuple. I plan to remove the rest argument and provide strictObject and objectWithRest besides object. Same for tuple. strictObject and objectWithRest simply reuse object internally. This change has two benefits. On the one hand, the implementation is more modular, which reduces the bundle size when using only object without the rest argument. On the other hand, it reduces the complexity of the implementation and the type overload definitions.

If we also remove the pipe argument and introduce the pipe function, we completely eliminate function overload definitions for schema functions. If we do this, we can consider either passing the metadata in the form of an object as an argument to schema functions (which would be the easiest), or providing metadata actions for the pipe function. However, this would require the metadata actions to be executed first, which degrades the performance of the initialization and increases the bundle size.

I would therefore prefer option 1. Nevertheless, we should also take a look at the DX.

fabian-hiller commented 9 months ago

I could imagine that the first optional argument is always the message and the second optional argument is metadata.

// With message and metadata
const Schema = pipe(
  string('Text ...', { description: 'Text ...' }),
  minLength(1)
);

// With just metadata
const Schema = pipe(
  string({ description: 'Text ...' }),
  minLength(1)
);
xcfox commented 9 months ago

I prefer to use metadata as argument, it makes valibot more clean when constructing the schema:

// With the new `pipe` function
const UserSchema = pipe(
  object({
    id: pipe(string(), primaryKey(), columnName(user_id)),
    name: pipe(string(), unique()),
    bio: pipe(string(), description('Text ...')),
  }),
  table('users')
)

// With metadata argument
const UserSchema = object(
  {
    id: string({ primaryKey: true, columnName: 'user_id' }),
    name: string({ unique: true }),
    bio: string({ description: 'Text ...' }),
  },
  { tableName: 'users' }
)

Another important point is that when using valibot as a pure schema builder, the ORM itself contains solid validation, and it is not very useful to validate data on top of the ORM.
That said, when I use the medatada feature, I almost no longer need the pipeline to validate the data. Separating the pipeline from medatada seems perfect to me. This actually provides better DX, I don't have to declare pipe over and over again.

The only extra work for the developer is to declare the extra metadata fields while installing valibot, as yup and trpc do:

// env.d.ts
import { MikroSchemaMetadata } from 'valibot-mikro'
import 'valibot'

declare module 'valibot' {
  export interface SchemaMetadata extends MikroSchemaMetadata {}
}
fabian-hiller commented 9 months ago

Thank you very much for your feedback! I will get back to you as soon as I have made the proposed changes. Then we can discuss the details of the metadata feature implementation together.

xcfox commented 7 months ago

I've found that having TypeScript accurately infer the metadata type is very helpful in checking the correctness of the program.

const Cat = pipe(
  object({
    id: pipe(string(), primaryKey(), columnName('user_id')),
    name: pipe(string(), unique()),
    loveFish: pipe(boolean(), description('Does the cat love fish?')),
  }),
  name('Cat')
);

expectTypeOf<typeof Cat['name']>().toEqualTypeOf<string>()
const CatEntity = toMikroEntity(Cat); // It should pass

const Dog = object({
  id: string(),
  name: string(),
  loveFish: boolean(),
})

const DogEntity = toMikroEntity(Dog); // It should fail with a type error because Dog does not have a name

This does not seem to be achievable using metadata arguments. So now I think pipe is a better design.

Another idea is to add the with method to the schema:

export interface BaseSchema {
  // ...
   with<T>(modifier: (schema: this) => T): T;
  // ...
}

function _with<T>(this: BaseSchema, modifier: (schema: this) => T): T {
  return modifier(this);
}
const Cat = object({
  id: string().with(primaryKey()).with(columnName('user_id')),
  name: string().with(unique()),
  loveFish: boolean().with(description('Does the cat love fish?')),
}).with(name('Cat'));

This is more readable than pipe, but adds a little bit of bundle size.

fabian-hiller commented 7 months ago

I don't understand what toMikroEntity is doing.

This does not seem to be achievable using metadata arguments.

Can you explain this in more detail?

xcfox commented 7 months ago

I have a detailed implementation of toMikroEntity at valibot-mikro

When EntitySchema is missing a name, toEntitySchema throws an error. If TypeScript could hint at this missing name error, then we could notice the error much earlier, rather than reporting it at runtime.

In order for TypeScript to do this hint, we need to carry more detailed types on the valibot schema, For example, carrying a name:

function name(
  value: string
): <T extends object>(x: T) => T & { name: string } {
  return (x) => {
    x.name = value
    return x
  }
}

// with pipe
const Cat = pipe(
  object({
    id: pipe(string(), primaryKey(), columnName('user_id')),
    name: pipe(string(), unique()),
    loveFish: pipe(boolean(), description('Does the cat love fish?')),
  }),
  name('Cat')
);

expectTypeOf<typeof Cat['name']>().toEqualTypeOf<string>()

Using pipe, we can modify the type of schema, in this example, we've added the extra name field to Cat. And I think it's harder to achieve this with metadata argument.

// with metadata argument
const Cat = object(
  {
    id: string(),
    name: string(),
    loveFish: boolean({ description: 'Does the cat love fish?' }),
  },
  { name: Cat }
)

// Is this possible?
expectTypeOf<(typeof Cat)['name']>().toEqualTypeOf<string>()
expectTypeOf<
  (typeof Cat)['entries']['loveFish']['description']
>().toEqualTypeOf<string>()
fabian-hiller commented 7 months ago

Thank you for the details! I am quite busy with my studies at the moment. I will probably get back to you next week.

fabian-hiller commented 6 months ago

This is still on my list. I will get back to you as soon as I have the time.

xcfox commented 5 months ago

There is another reason why I think the pipe scheme is more appropriate: integer, cuid2, uuid are declared in pipe, and in TypeScript/JavaScript they are validations, but in GraphQL, PostgreSQL they also represent types.

When I use valibot to just declare the schema without performing validation, I still can't avoid using pipe.

Let's try to define a simple table:

// With pipe
const CatSchema = pipe(
  object({
    id: pipe(string(), uuid(), primaryKey()),
    name: pipe(string(), index()),
    age: pipe(number(), integer()),
  }),
  table('Cat')
)

// With metadata argument
const Schema = object(
  {
    id: pipe(string({ primaryKey: true }), uuid()),
    name: string({ index: true }),
    age: pipe(number(), integer()),
  },
  { table: 'Cat' }
)

The pipe solution is neater and easier to read than the metadata argument.

fabian-hiller commented 5 months ago

Thank you for that example. I will take it into account when I review everything.

fabian-hiller commented 5 months ago

As mentioned in this blog post, I am currently investigating the implementation of a function and promise schema. Feel free to join the discussion in #243. After that, I will focus on the metadata feature.

fabian-hiller commented 5 months ago

I expect to be working on this feature next week.

fabian-hiller commented 4 months ago

Currently I see 3 options to implement the metadata feature.


Option 1: metadata method

The metadata method would behave like any other method. It takes a schema as its first argument and returns a copy of it with a metadata object property (the second argument) that can take any values. Using generics, this can be made completely typesafe when accessing schema.metadata.

Implementation

Click to show code ```ts import type { BaseIssue, BaseSchema, BaseSchemaAsync } from 'valibot'; /** * Schema with metadata type. */ export type SchemaWithMetadata< TSchema extends | BaseSchema> | BaseSchemaAsync>, TMetadata extends Record, > = TSchema & { /** * The metadata. */ readonly metadata: TMetadata; }; /** * Adds metadata to a schema. * * @param schema The schema to add metadata to. * @param metadata The metadata to add. * * @returns The schema with metadata. */ export function metadata< const TSchema extends | BaseSchema> | BaseSchemaAsync>, const TMetadata extends Record, >(schema: TSchema, metadata: TMetadata): TSchema { return { ...schema, metadata }; } ```

Usage

import * as v from 'valibot';

// Create schema
const UserSchema = v.metadata(
  v.object({
    id: v.metadata(v.pipe(v.string(), v.uuid()), { index: true }),
    name: v.pipe(v.string(), v.nonEmpty(), v.maxLength(32)),
    age: v.pipe(v.number(), v.integer(), v.minValue(0), v.maxValue(100)),
  }),
  { table: 'users' }
);

// Access metadata
const userMetadata = UserSchema.metadata; // { table: 'users' }
const idMetadata = UserSchema.entries.id.metadata; // { index: true }

Advantages

Disadvantages


Option 2: metadata argument

Currently, almost any schema accepts a message as the first optional argument. We could extend the implementation to allow metadata as the second optional argument.

Implementation

Click to show code
Example implementation for `string` schema: ```ts import { _addIssue, type BaseIssue, type BaseSchema, type Dataset, type ErrorMessage, } from 'valibot'; /** * String issue type. */ export interface StringIssue extends BaseIssue { /** * The issue kind. */ readonly kind: 'schema'; /** * The issue type. */ readonly type: 'string'; /** * The expected property. */ readonly expected: 'string'; } /** * String schema type. */ export interface StringSchema< TMessage extends ErrorMessage | undefined, TMetadata extends Record | undefined, > extends BaseSchema { /** * The schema type. */ readonly type: 'string'; /** * The schema reference. */ readonly reference: typeof string; /** * The expected property. */ readonly expects: 'string'; /** * The error message. */ readonly message: TMessage; /** * The schema metadata. */ readonly metadata: TMetadata; } /** * Creates a string schema. * * @returns A string schema. */ export function string(): StringSchema; /** * Creates a string schema. * * @param message The error message. * * @returns A string schema. */ export function string< const TMessage extends ErrorMessage | undefined, >(message: TMessage): StringSchema; /** * Creates a string schema. * * @param metadata The schema metadata. * * @returns A string schema. */ export function string< const TMetadata extends Record | undefined, >(metadata: TMetadata): StringSchema; /** * Creates a string schema. * * @param message The error message. * @param metadata The schema metadata. * * @returns A string schema. */ export function string< const TMessage extends ErrorMessage | undefined, const TMetadata extends Record | undefined, >(message: TMessage, metadata: TMetadata): StringSchema; export function string( arg1?: ErrorMessage | Record, arg2?: Record ): StringSchema< ErrorMessage | undefined, Record | undefined > { const [message, metadata] = typeof arg1 === 'object' ? [undefined, arg1] : [arg1, arg2]; return { kind: 'schema', type: 'string', reference: string, expects: 'string', async: false, message, metadata, _run(dataset, config) { if (typeof dataset.value === 'string') { dataset.typed = true; } else { _addIssue(this, 'type', dataset, config); } return dataset as Dataset; }, }; } ```

Usage

import * as v from 'valibot';

// Create schema
const UserSchema = v.object(
  {
    id: v.pipe(v.string('My custom error message', { index: true }), v.uuid()),
    name: v.pipe(v.string(), v.nonEmpty(), v.maxLength(32)),
    age: v.pipe(v.number(), v.integer(), v.minValue(0), v.maxValue(100)),
  },
  { table: 'users' }
);

// Access metadata
const userMetadata = UserSchema.metadata; // { table: 'users' }
const idMetadata = UserSchema.entries.id.metadata; // { index: true }

Advantages

Disadvantages


Option 3: Metadata actions

The metadata could also be added via actions in our new pipe method. Here the implementation is not so clear as there are many possibilities. We could add a metadata action for each property like description, table and index or a single one called metadata that behaves similar to option 1. Another question is whether we should further process the metadata actions to make them more accessible, which results in a larger bundle size.

The advantages and disadvantages vary greatly depending on which implementation is chosen. For the following evaluation, I will assume that each metadata property gets its own action and that metadata actions are processed in some additional form.

Usage

import * as v from 'valibot';

// Create schema
const UserSchema = v.pipe(
  v.object({
    id: v.pipe(v.string(), v.uuid(), v.index()),
    name: v.pipe(v.string(), v.nonEmpty(), v.maxLength(32)),
    age: v.pipe(v.number(), v.integer(), v.minValue(0), v.maxValue(100)),
  }),
  v.table('users')
);

Advantages

Disadvantages


I look forward to your feedback. Please mention any additional advantages or disadvantages you have in mind. I will update my lists accordingly.

devcaeg commented 4 months ago

As a fan of MikroOrm and using it in several projects, I would love to be able to define schemas using Valibot, since I currently have to define schemas in MikroOrm and in turn in Valibot.

I believe that the "metadata" should be an "action" that is added to the "pipe". It could be something like the "custom" action but allowing to add "metadata".

fabian-hiller commented 4 months ago

I believe that the "metadata" should be an "action" that is added to the "pipe". It could be something like the "custom" action but allowing to add "metadata".

Why do you like this solution the most?

sandros94 commented 4 months ago

NGL I do enjoy the first approach, it feels like a pipe but exlusively for metadatas. I'm not fully convinced by the second approach, since it could theoretically limit future implementations that we currently don't know yet. The third approach, while being the simplest to use, it feels too limiting and easily become even more complex to maintian (for the devs that will use it) once they need to integrate non-common properties and potentially ditching valibot's metadata entirely and switching back to a INSERT_YOUR_NAME_HERE -> valibot approach (like the mikroorm described).

fabian-hiller commented 4 months ago

Thank you very much! I agree with your feedback. Since option 1 is just an addition to the library and does not affect any other code, we could add it with minimal effort, and if we decide in the long run that a different solution works better, we can deprecate it without any major drawbacks. However, I will wait a week to get more feedback before making a decision.

shayneo commented 4 months ago

Been following along as we've been awaiting metadata support in valibot.

Personally a fan of option 1. It matches my mental model for other valibot methods.

Option 2 feels a bit less intuitive, since you are passing a raw object in as an argument, there is no hints in the code at what that argument actually does.

I also like Option 3 --> though restricting to only common metadata properties seems like a non-starter, unless you added a more generic action... something like:

import * as v from 'valibot';

// Create schema
const UserSchema = v.pipe(
  v.object({
    id: v.pipe(v.string(), v.uuid(), v.metadata({ index: true })),
    name: v.pipe(v.string(), v.nonEmpty(), v.maxLength(32)),
    age: v.pipe(v.number(), v.integer(), v.minValue(0), v.maxValue(100)),
  }),
  v.metadata({ table: 'users' })
);

Hope that helps! Thanks for the great work on this lib.

xcfox commented 4 months ago

I don't favor using key-value pairs to store metadata: In my backend app, I use valibot to define both GraphQL Objects and Mikro Entities, each with their own rich metadata. There are metadata fields with the same name but different types:

import * as v from 'valibot'
import { asField } from '@valibot/graphql'
import { asProperty } from '@valibot/mikro'
import { GraphJSONType } from 'graphql-scalar'
import { JsonType } from '@mikro-orm/core'

export const User = v.object({
  data: v.pipe(
    v.record(v.string(), v.any()),
    asField({ type: GraphJSONType }),
    asProperty({ type: JsonType })
  ),
})

If we use key-value pairs, library users need to declare additional interfaces to get type hints:

// env.d.ts
import { Metadata } from 'valibot'
import { PropertyOptions, EntityMetadata } from '@mikro-orm/core'
import { GraphQLFieldConfig, GraphQLObjectConfig } from 'graphql'
declare module 'valibot' {
  export interface Metadata
    extends PropertyOptions,
      GraphQLFieldConfig,
      GraphQLObjectConfig,
      EntityMetadata {}
}

If we use Metadata actions, each action will come with its own type declaration, and we avoid using the same name for different fields.

sandros94 commented 4 months ago

I also like Option 3 --> though restricting to only common metadata properties seems like a non-starter, unless you added a more generic action... something like:

import * as v from 'valibot';

// Create schema
const UserSchema = v.pipe(
  v.object({
    id: v.pipe(v.string(), v.uuid(), v.metadata({ index: true })),
    name: v.pipe(v.string(), v.nonEmpty(), v.maxLength(32)),
    age: v.pipe(v.number(), v.integer(), v.minValue(0), v.maxValue(100)),
  }),
  v.metadata({ table: 'users' })
);

Also agree for the case of option 3. But still don't fully understand the drawbacks of this implementation, so I might still prefer option 1

Edit:

If we use Metadata actions, each action will come with its own type declaration, and we avoid using the same name for different fields.

True, I didn't think of it

xcfox commented 4 months ago

Less powerful as we can only support common metadata properties by default
Other properties needs to be provided by third party libraries as needed

I don't think that's the downside, I think that's the goal.

As Class Reflect Metadata, does. Each library defines and consumes its own metadata without interfering with the other. An example would be TypeGraphQL + MikroORM

In addition, universally used metadata like descriptions should be provided by the valibot library. Then we don't have to repeat the same thing in different sufferings

fabian-hiller commented 4 months ago

Thanks for all your feedback! @xcfox if we choose the pipe option, can we implement it so that we just ignore the metadata actions in pipe and do nothing else? That way the impact on bundle size would be minimal.

xcfox commented 4 months ago

if we choose the pipe option, can we implement it so that we just ignore the metadata actions in pipe and do nothing else? That way the impact on bundle size would be minimal.

It's fine. We also need to provide a getMetadata() function which help third party libraries to collect Metadata from pipe.

fabian-hiller commented 4 months ago

It's fine. We also need to provide a getMetadata() function which help third party libraries to collect Metadata from pipe.

This would only work if we follow the extraProperties pattern you implemented in #655, but since that does not fit well with the rest of our code, I am not sure I want to do that.

If external libraries take care of implementing specific metadata actions, they could also take care of capturing them with their own code. What do you think?

xcfox commented 4 months ago

If external libraries take care of implementing specific metadata actions, they could also take care of capturing them with their own code. What do you think?

Maybe you're right. Actually collecting metadata from the pipeline takes less than a few lines of code, but it does require the library author to learn about valibot

fabian-hiller commented 4 months ago

It seems to me that we will go with option 1, the metadata method, or option 3 by adding specific metadata actions to our pipe method. I will wait about a week for more feedback and then share my current thoughts.

If you are new to this issue, take a look at this comment.

fabian-hiller commented 4 months ago

I have created a draft PR #747 for option 3 (metadata actions). Feel free to review it.

fabian-hiller commented 4 months ago

I plan to merge #747 today, but I still welcome general feedback.