Consider migrating from Type to Interface

Saeris commented 12 months ago

Hey, opening this as a follow-up to #211 as I'm noticing some type errors that have come up as a result of changes to the base types that were made at the end of that PR. Here's a small example:

We have a root type most Validation functions extend from, BaseValidation:

type BaseValidation<TInput = any> = {
    async: false;
    _parse(input: TInput): PipeResult<TInput>;
};

And in the codebase these are "extended" via type intersections like so:

type EmojiValidation<TInput extends string> = BaseValidation<TInput> & {
    type: 'emoji';
    requirement: RegExp;
};

But this introduces a problem... because all Validation functions have a type and a requirement, but BaseValidation doesn 't include those! This causes what should be a good way to build a type guard for Validation functions to fail:

This is a case where we should be using interface instead of type, because here we're adhering to a common set of properties that each individual validation function narrows for its specific signature. This also applies to BaseSchema too, which suffers from this same problem.

So instead of the above, we should do this instead:

interface BaseValidation<TInput = any> {
    type: string;
    async: false;
    message: string;
    requirement: unknown;
    _parse(input: TInput): PipeResult<TInput>;
};

interface EmojiValidation<TInput extends string> extends BaseValidation<TInput> {
    type: 'emoji'; // Narrows the base property to a specific string
    requirement: RegExp; // Narrows the unknown property value to a specific type
};

And with that we can validate that all Validations share a common set of properties! (Also, we get a nice side benefit of Typescript running slightly faster too!)

Saeris commented 12 months ago

I can open a PR to resolve this later this afternoon maybe. Should be a relatively quick fix.

fabian-hiller commented 12 months ago

My comment from issue #258:

There is a reason why I did not include type: string in BaseSchema, BaseValidation and BaseTransform. If a function accepts both BaseSchema and a specific schema like ObjectSchema, it is not possible to distinguish between the schemas using .type if BaseSchema contains type: string. Without type: string, however, 'type' in schema can be used to check whether .type is included. The result is that all future checks behave like a discriminated union. This can be seen in the getDefaults function.

To make a long story short. My current recommendation is not to use BaseSchema, but to specifically use the schema types that your function supports. This ensures maximum type safety and type casting can be completely avoided. I look forward to your feedback.

Saeris commented 12 months ago

I'll need to investigate this further when I have more time. Will be busy for the rest of today so I can probably get back to this tomorrow. I'm confident that there is a path forward for #259 and the solution I am proposing here.

fabian-hiller commented 9 months ago

Adding type: string; to BaseSchema and BaseValidation adds more problems than it solves, in my opinion. The current implementation provides complete type safety when schemas are processed by other functions. It is possible to define exactly which schemas are supported. However, if we add type: string; to the base type, this implementation breaks and type safety is lost. Here is an example:

import * as v from 'valibot';

type Schema =
  | v.BaseSchema
  | v.StringSchema
  | v.ObjectSchema<v.ObjectEntries>
  | v.ArraySchema<v.BaseSchema>;

function schemaToJson(schema: Schema) {
  if ('type' in schema) {
    switch (schema.type) {
      // String
      case 'string':
        return { type: 'string' };

      // Object
      case 'object':
        return {
          type: 'object',
          entries: Object.fromEntries(
            Object.entries(schema.entries).map(([key, value]) => [
              key,
              schemaToJson(value),
            ])
          ),
        };

      // Array
      case 'array':
        return {
          type: 'array',
          item: schemaToJson(schema.item),
        };
    }
  }

  // Error
  throw new Error('Not implemented');
}

Saeris commented 9 months ago

Adding type: string; to BaseSchema and BaseValidation adds more problems than it solves, in my opinion. The current implementation provides complete type safety when schemas are processed by other functions. It is possible to define exactly which schemas are supported. However, if we add type: string; to the base type, this implementation breaks and type safety is lost.

Your example highlights a limitation of switch in TypeScript, which does not narrow types in the way you would expect. In general, I would suggest that use of switch be avoided altogether and use either if statements to conditionally early-return or use an object to lookup results based on property keys instead.

TypeScript 5.3 added better narrowing support to switch (true), which is basically the same as a series of if statements: https://www.typescriptlang.org/docs/handbook/release-notes/typescript-5-3.html#switch-true-narrowing

Since your example has an unrelated typescript error (TS can't infer the return type of the function given as-is), I'll have to follow-up when I have some more time to investigate.

Could you provide me with some other examples where this appears problematic to you? I'm not yet seeing where this "adds more problems than it solves".

fabian-hiller commented 9 months ago

I am not sure what exactly you mean with switch. The same problem applies to if, for example in our getDefaults method. If BaseSchema contains type: string; TypeScript can never narrow it down to a specific schema type by checking the .type property.

Can you provide me with your full code example from above so that I can investigate it?

Saeris commented 8 months ago

Hey just wanted to give a quick status update: I've been rather busy this week and I haven't spent more than an hour investigating a path forward yet. I'll circle back when I can.

Saeris commented 8 months ago

With the schemaToJson example first of all I just want to point out that in either implementation (types or interface), there is a type error for the inferred return type because the function is recursive:

'schemaToJson' implicitly has return type 'any' because it does not have a return type annotation and is referenced directly or indirectly in one of its return expressions.

That is because of the inner types ObjectEntries and for array items BaseSchema, which creates an un-inferrable recursive type. I was trying initially to solve for this in case it had anything to do with using interfaces instead, but it turns out it's specific to this example code.

I should probably walk what I was saying about switch back a bit. It used to be that if you had switch (true) and had a conditional in each case statement, then the subsequent cases wouldn't benefit from type narrowing. That doesn't really apply to what we're talking about here since we're switching on a property value rather than the truthiness of each case.

So then, what really marks the difference between type and interface for that code sample? It's actually the if ("type" in schema") condition which is dependent on whether BaseSchema has a type property or not.

In my PR, since I have type: string in BaseSchema, that means that if statement doesn't do any type narrowing before we enter the switch block. But what is important to consider is that the example code always enters the if block because every schema passed to this function does have a type property with a string value. That means there is useless code here that only really serves an authoring-time purpose.

At present, I don't believe there are any schemas in Valibot that do not have a type property.

So that being the case, I don't think having type; string in BaseSchema in reality is creating more problems, rather it is accurately modeling at the type level the reality of runtime code. The "problem" then is at authoring time, where we want to narrow a type so that TypeScript stops complaining to us.

In the PR I created, I added some utilities for type narrowing: isSchema, isObjectSchema, and isTupleSchema. Last week I came about a simpler method than creating a utility for each schema type, and here it is:

export const isOfType = <U extends { type: string }, const T extends string>(
  val: U,
  type: T
): val is Extract<U, { type: T }> => val?.type === type;

Here's how it is used in the context of your example:

function schemaToJson<S extends Schema>(schema: S) {
    if (isOfType(schema, "string")) {
      // schema: StringSchema
      return { type: 'string' };
    }
    if (isOfType(schema, 'object')) {
      // schema: ObjectSchema<ObjectEntries>
      return {
        type: 'object',
        entries: Object.fromEntries(
          Object.entries(schema.entries).map(([key, value]) => [
            key,
            schemaToJson(value),
          ])
        ),
      };
    }
    if (isOfType(schema, 'array')) {
      // schema: ArraySchema<BaseSchema>
      return {
        type: 'array',
        item: schemaToJson(schema.item),
      };
    }

  throw new Error('Not implemented');
}

This utility can be used with any object that has a type: string property (so schemas, transformations, or validations) and will correctly narrow the input val to the desired type. We get both the runtime type guard we need as well as the authoring time narrowing all in one tiny function. Pretty handy!

Now I want to throw it back to you again @fabian-hiller, are there other examples you can think of that are problematic?

I'll be busy for the next several days FYI. Don't think there's any need to rush this change if you have concerns still.

fabian-hiller commented 8 months ago

Please take a look at this code example. Adding type to BaseSchema in line 2 breaks the schemaToJson code. This is what I'm referring to. But what you say is true. Adding a util function like isOfType solves this problem as we can see in this code example. I will consider adding type to BaseSchema along with such a util. Do you have any ideas for alternative names for the util?

Saeris commented 8 months ago

Maybe just hasType? isOfType was just the name I found in some stackoverflow answer, so I'm not committed to it. Same with the argument names.

I only really think the type signature on the function is critical here, as we want { type: string } on the val generic so as to match to any Schema, Transformation or Validation.

Then it's important to note that when using this function, narrowing only works if the type of the provided val argument by the end user is a union of types. When I update #259 this will be a bit clearer, as I had to update the types on getDefaults/getFallbacks a bit. I started resolving for merge conflicts yesterday but won't have time to finish that up until the weekend.

A usage comment with maybe a minimal example might be a good idea to communicate how this function works to library authors. It really only needs to emphasize the above, that it only narrows when val is a union of types. That's the main difference between it as a generic utility and the old schema-specifc functions I had written (isObjectSchema and isTupleSchema).

fabian-hiller commented 8 months ago

Thank you very much! I will review #259 and #451 again over the weekend or early next week.

fabian-hiller / valibot

Consider migrating from Type to Interface #257