colinhacks / zod

TypeScript-first schema validation with static type inference
https://zod.dev
MIT License
32.65k stars 1.13k forks source link

z.union() inside of z.array repeats the same issue multiple times #3638

Open ElGatoLoco opened 1 month ago

ElGatoLoco commented 1 month ago

Let's say I have a schema like this

const questionsSchema = z.array(
  z.union([
    z.object({
      code: z.string(),
      value: z.enum(['Yes', 'No']),
    }),
    z.object({
      code: z.string(),
      value: z.enum(['Yes', 'No']),
      meta: z.object({
        something: z.number(),
      }),
    }),
    z.object({
      code: z.string(),
      value: z.enum(['Yes', 'No']),
      meta: z.object({
        somethingElse: z.boolean(),
      }),
    }),
  ]),
);

And the following array which I try to validate:

const questions = [
  {
    code: 'Q1',
    value: 'No',
    meta: {
      something: 123,
    },
  },
  {
    code: 'Q2',
    value: 'Yes',
    meta: {
      somethingElse: false,
    },
  },
  { code: 'Q3', value: 'No' },
];

If there are no errors, as in the example above, it all looks good. But if I mess up one of the values, e.g. 'No' -> 'Nooo' for Q1, I'd expect to get a single issue, instead of the issue being repeated multiple times and wrongly interpreting the type of one of the items.

[
  {
    "code": "invalid_union",
    "unionErrors": [
      {
        "issues": [
          {
            "received": "Nooo",
            "code": "invalid_enum_value",
            "options": ["Yes", "No"],
            "path": [0, "value"],
            "message": "Invalid enum value. Expected 'Yes' | 'No', received 'Nooo'"
          }
        ],
        "name": "ZodError"
      },
      {
        "issues": [
          {
            "received": "Nooo",
            "code": "invalid_enum_value",
            "options": ["Yes", "No"],
            "path": [0, "value"],
            "message": "Invalid enum value. Expected 'Yes' | 'No', received 'Nooo'"
          }
        ],
        "name": "ZodError"
      },
      {
        "issues": [
          {
            "received": "Nooo",
            "code": "invalid_enum_value",
            "options": ["Yes", "No"],
            "path": [0, "value"],
            "message": "Invalid enum value. Expected 'Yes' | 'No', received 'Nooo'"
          },
          {
            "code": "invalid_type",
            "expected": "boolean",
            "received": "undefined",
            "path": [0, "meta", "somethingElse"],
            "message": "Required"
          }
        ],
        "name": "ZodError"
      }
    ],
    "path": [0],
    "message": "Invalid input"
  }
]
samchungy commented 1 month ago

How would the schema know which one it is meant to validate against? It's showing you three errors because it cannot validate against any in the union. You need to use a discriminated union

ElGatoLoco commented 1 month ago

I might be missing something, but isn't it possible to figure this out by looking at the shape of the object?

In my example, Q1 has something key under meta property, which only matches the second shape provided to z.union.

samchungy commented 1 month ago

How would we definitively know that, that specific object is what you were attempting to match?

What if we had

{
    code: 'Q5',
    value: 'No',
    meta: {
      something: 123,
      somethingElse: 'bla'
    },
  },

which one would we match? You need to use the discriminated union

ElGatoLoco commented 1 month ago

If I'm to ask, I would match none with the payload you provided because it's ambiguous. I understand it might not be as performant to try and figure out what to match, but logically I think it is more sensible than matching against everything and showing an array of errors.

For my particular use case I could use discriminated union, but since I'm building validation logic dynamically, the downside is that I'll have to preprocess the data I get from a third-party CSV file and figure out the keys, which I wouldn't have to provide to a regular union.

samchungy commented 1 month ago

How would you determine what to match? What would you prioritise? Even in the case where you would say none of them match, what would you return back?

Which takes greater precedence, a missing key in an object? Or a correct key but incorrect type?

There's far too many permutations