Open colinhacks opened 6 months ago
LGTM! πͺ
All Zod schemas will implement a .getLiterals(): Set
method that returns the set of known literal values that will pass validation. For something like z.string(), this will be undefined (there are infinitely many valid inputs). For something like a ZodEnum or ZodLiteral, this will be a finite set of values.
Wondering if .getLiterals()
should be optional (it's only relevant for types that should return something meaningful and not undefined
), or inherited from base (returning undefined
) and only overriden when it matters.
Am I missing something?
P.S. abort early! π
I think this makes a lot of sense! ArkType goes pretty far down the "let's create a runtime type system and handle discrimination robustly" rabbit hole, but I think using heuristics like checking only literals at the root of the object can get you most of the benefit and save a ton of complexity.
The other category of disjoints besides literals that might be worth considering is typeof
, e.g. if a string
, number
and boolean
are at the same path. This often occurs incidentally, so it's a nice way to discriminate more unions without users having to think about it.
If it's of any use, this is the algorithm AT uses to determine the "best" discriminator (or sequence of discriminators, if it requires multiple steps) for a union:
Happy to discuss in more detail any time if it would be useful! Definitely one of my favorite subjects if you don't mind hearing me nerd out a bit over it π
Am I understanding this right? Before:
z.discriminatedUnion("kind", [
z.object({ kind: z.literal("number"), amount: z.number() }),
z.object({ kind: z.literal("boolean"), value: z.boolean() })
])
After:
z.union([
z.object({ kind: z.literal("number"), amount: z.number() }),
z.object({ kind: z.literal("boolean"), value: z.boolean() })
])
Optional:
z.union([
z.object({ kind: z.literal("number"), amount: z.number() }),
z.object({ kind: z.literal("boolean"), value: z.boolean() })
], {
discriminator: "kind"
})
The "after" schema will initialize a bit slower(?) than it would today, but parse much faster than it would today and up to around as fast as the "before" schema does today? If so, I like it a lot, discriminatedUnion
always felt a bit redundant to write out by comparison and the parse speed boost for union
schemas that happen to be discriminated would be great.
Wondering if .getLiterals() should be optional
Yep this would be defined on ZodType
with a default implementation, as you say. It would probably also be protected
...not sure this is an internal we should expose. It's true that this method would apply to very few schema types.
If it's of any use, this is the algorithm AT uses to determine the "best" discriminator (or sequence of discriminators, if it requires multiple steps) for a union:
Thanks David I'll look into this! I haven't thought through how discriminator selection would work, so this is super useful.
The other category of disjoints besides literals that might be worth considering is typeof
I was thinking about this. Maybe a special symbol (z.STRING
) to indicate that any string would pass.
The "after" schema will initialize a bit slower(?) than it would today, but parse much faster than it would today and up to around as fast as the "before" schema does today?
Precisely π―
LGTM and thanks for the breakdown of this neat pre-computation strategy π !
I have a few clarifications for better understanding:
When you create a z.union(), Zod uses getLiterals() to extract a set of "fast checks" for each union element in the form {[k: string]: Set
How would the fast-checks generalize for unions where the "discriminator" is a non-primitive e.g an object?
Here's an example building on top of @jrysana example but nesting kind
one level deeper in an info
object:
z.union([
z.object({ info: z.object({ kind: z.literal("number") }), amount: z.number() }),
z.object({ info: z.object({ kind: z.literal("boolean") }), value: z.boolean() })
])
This question is based on my understanding of k
in {[k: string]: Set<Primitive>}
as referring to a field of an object in the discriminated union - I may be misunderstanding.
Side note: this schema may not be the best, but it seems like something that could exist out in the wild.
Lastly if this has already been thought through or it'd be figured out at a later time, I'm happy to wait and see what it would look like! Maybe this is what you are referring to as discriminator selection by:
Thanks David I'll look into this! I haven't thought through how discriminator selection would work, so this is super useful.
The other category of disjoints besides literals that might be worth considering is typeof
I was thinking about this. Maybe a special symbol (z.STRING) to indicate that any string would pass.
This seems like a nice idea to use special symbols (like z.STRING
), in order to avoid conflicts with (string) literals as typeof
would evaluate to.
Also, PS. I am glad that z.union
is going to be preferred over z.switch
as code generation is much easier with the former!
Edit: Ok now I see it will solve my problem automatically. Thanks again for your great work, looking forward to the implementation.
Nice to see this is going to be addressed. Thank you for your hard work and ideas.
One thing that I'd like to see being introduced to union
is precise error messages.
Currently when using union
for discriminated unions (as it's required for recursively defined data), the parser tries each branch thus gives a big error message:
const u = z.union([
z.object({ type: z.literal("a"), value: z.number() }),
z.object({ type: z.literal("b"), value: z.string() }),
]);
console.log(JSON.stringify(u.safeParse({ type: "a", value: "b" })));
z.union
which is hard to digress and turns into wall of text when shown to the users.
OTOH since discriminatedUnion
has an explicit discriminator assigned, it can precisely give the actual branch where the error happens:
const du = z.discriminatedUnion("type", [
z.object({ type: z.literal("a"), value: z.number() }),
z.object({ type: z.literal("b"), value: z.string() }),
]);
console.log(JSON.stringify(du.safeParse({ type: "a", value: "b" })));
z.discriminatedUnion
Will the "discriminator hint" solve this problem? Thank you.
One thing that I'd like to see being introduced to
union
is precise error messages.
I second that. I use discriminatedUnion
quite extensively and thatβs not because of its performance advantages, the most important reason is the precise error messages.
So far I really like the proposal, but I hope it addresses also the error messages, otherwise it will be unusable for me.
Forgive my ignorance, but would this allow for nested discriminated unions unlike z.discriminatedUnion
?
I am using this discriminatedUnion
but I require that all objects have multiple shared keys but by only 1 shared key you can know the difference between the unions.
example:
const myUnion = z.discriminatedUnion("status", [
z.object({ status: z.literal("success"), data: z.string(), timestamp: z.date() }),
z.object({ status: z.literal("failed"), error: z.instanceof(Error), timestamp: z.date() }),
]);
myUnion.parse({ status: "success", data: "yippie ki yay" });
This is maybe a dumb example with the timestamp
but that's what I would need to do if I want multiple shared keys. I thought I could maybe use a merge or extend, except discriminated unions do not allow this.
This can actually be done with .and()
New problem:
The status
value must be dynamically allowed to. This throws an error now because the value is of course not found in the union (other than success
or failed
). I would like that it will not throw an error and is parsed with the default properties that are added in the schema
Why not instead of getLiterals()
just make a more simple fastCheck(val): true | false | "unavailable"
(or an enum
) method instead?
By default it would return "unavailable"
but for literal you could return val === this.value
.
And it would allow an easy implementation on objects too with something like:
fastCheck(val) {
let res: true | false | "unavailable" = "unavailable";
for (const attr of this.attributes) {
const fastCheck = attr.fastCheck(val[attr]);
if (fastCheck === "unavailable") continue;
else if (!fastCheck) return false;
res = true;
}
return res;
}
Then to collect all possibilities in an union
you could just do something like:
const possibleSchemas = allSchemas.filter(s => s.fastCheck(val) !== false);
Maybe there is not much value to "unavailable"
and it could even be merged with true
.
I made my schema like this now:
export const store_order_item_schema = z.object({
session_token: z.string().length(128),
product_type: z.string(),
product_name: z.string().max(200),
value: z.string().max(200),
period: z.number().positive().default(1),
quantity: z.number().positive().default(1),
}).and(z.discriminatedUnion("product_type", [
z.object({
product_type: z.literal('domain_extension'),
extension: z.string().refine((v) => !v.startsWith('.'), 'Domain extension must not start with a dot'),
status: z.enum(['free', 'active']),
transfer_code: z.string().optional(),
//! you can not order multiple domains with the same name
quantity: z.literal(1),
period: z.literal(12).or(z.literal(24)).or(z.literal(36)).or(z.literal(48)).or(z.literal(60))
}),
z.object({
product_type: z.literal('pax8_license'),
term: z.enum(['Annual', 'Monthly']),
license_id: z.string().uuid(),
commitment_term_id: z.string().uuid(),
}),
z.object({
product_type: z.literal('hosting'),
hostname: z.string()
}),
z.object({product_type: z.literal('other')})
]))
The only thing what is bothering me is that I cannot use other values in this schema
I would like to have the props session_token
, product_type
, product_name
, value
, period
, quantity
defaulty be added to any product_type I set.
How about multiple discriminators that cause nesting issue like https://github.com/colinhacks/zod/issues/1884?
Would it be possible to accept an array of discriminators? for example:
z.union([ ... ], {
discriminator: ["someKey1", "someKey2"]
})
Could getLiterals()
also be used to fix intersection parsing as well? E.g., in #2195
I started working on a function that would split the input object into two objects: one to pass to the ZodObject
parser with just the keys that are in that object and one with the rest of the keys to pass to the ZodRecord
parser. I got pulled off on another project, but there are also issues like dealing with strict()
, passthrough()
, etc.
z.union
should automatically precompute the discriminator property for common cases, just like TypeScript does under the hood. I did this for unions in my old typed-validators
library. discriminatorHint
isn't even necessary unless the user wants to speed up that precomputation a bit.
The logic is basically, for all of the object options, does there exist some property such that each object has a different literal value for that property.
If you want to get more powerful, the precomputation could even check each object has a type for that property that doesn't overlap with the others (e.g. type A = { type: 1, value: string } | { type: 2 | 3, value: number } | { type: string, value: boolean }
). This could be done with an _overlaps(type): boolean | undefined
method on ZodType
, where z.string()._overlaps(z.literal('foo'))
is true
, z.string()._overlaps(z.number())
is false
, z.literal('foo')._overlaps(z.union(['foo', 'bar']))
is true
, etc. _overlaps
would be allowed to bail out and return undefined
for hard cases, in which discriminator determination would assume there might be overlap.
z.union
could precompute a discriminator for the object options even if some options aren't objects. Then when parsing, if the input is an object, you get the option for the discriminator value (if any), otherwise you fall back to other strategies to decide between the non-object options.
What about situations where you need something like this:
import { z } from "zod";
const fooSchema = z
.object({
type: z.literal("foo"),
})
.and(
z.discriminatedUnion("reqString", [
z.object({
reqString: z.literal(true),
someString: z.string().min(1).max(10),
}),
z.object({
reqString: z.literal(false),
someString: z.string().optional(),
}),
]),
);
const barSchema = z.object({
type: z.literal("bar"),
barString: z.string(),
});
const bazSchema = z.object({
type: z.literal("baz"),
});
const schema = z.discriminatedUnion("type", [fooSchema, barSchema, bazSchema]);
type Schema = z.infer<typeof schema>;
Currently it yields the following error:
TS2740: Type
ZodIntersection<ZodObject<{ type: ZodLiteral<"foo">; }, "strip", ZodTypeAny, { type: "foo"; }, { type: "foo"; }>, ZodDiscriminatedUnion<"reqString", [ZodObject<{ reqString: ZodLiteral<true>; someString: ZodString; }, "strip", ZodTypeAny, { reqString: true; someString: string; }, { reqString: true; someString: string; }>, ZodObject<{ reqString: ZodLiteral<false>; someString: ZodOptional<ZodString>; }, "strip", ZodTypeAny, { reqString: false; someString?: string | undefined; }, { reqString: false; someString?: string | undefined; }>]>>
is missing the following properties from type
ZodObject<{ type: ZodTypeAny; } & ZodRawShape, UnknownKeysParam, ZodTypeAny, { [x: string]: any; type?: unknown; }, { [x: string]: any; type?: unknown; }>
: _cached, _getCached, shape, strict, and 14 more.
$0.02: name the option discriminatorHint
(or maybe hint: { discriminator }
) assuming it is only for perf and doesn't change the behaviour
// With `discriminator` I'd expect this to warn me I've done something dumb somehow or be a bug
// With `discriminatorHint` I understand maybe it's a perf boost to look at the `type` key first, even if it only gets close
z.union([
z.object({ type: z.literal(1) }),
z.object({ type: z.literal(2) }),
z.object({ type: z.literal(2), data: z.string() }), // huh
// ... probably more idk
], { discriminator: 'type' })
@colinhacks is there any news on this? I can't continue dealing with the absolutely huge errors that arise from using unions. What was wrong with .switch
?
What was wrong with
.switch
?
The issue description has some answers
Glad that the unions are getting an improvements. Got into this rabbit hole and ended in this issue after trying to optimize my complex schema.
I have complex schema that can have the "discriminator" be string literal or a (first) item in an array. The current approach with discriminatedUnion
doesn't work, because it only allows static literal values for discriminator.
I want to propose that the discriminator property of the new union implementation be literal OR schema OR custom function.
kind: z.union(z.literal('foo'), z.string().array().contains('foo')
, for discriminator of 'kind' the schema is tested. If the schema is OK, the whole schema is being run and the result is reported back.switch()
Okay, I'm glad I didn't rush a
z.switch()
implementation, because I now think there's a better way forward. This is definitely my own brilliant idea and not something @gcanti suggested in a Twitter DM.It's pretty simple, we just...make
z.union()
better. Here's the case for sticking with plainz.union()
:z.switch()
. Code generation tools need a way to enumerate all elements of a union, and that isn't possible withz.switch()
.z.discriminatedUnion
How? The idea is for Zod to do some "pre-computation" at the schema creation time to make parsing faster.
.getLiterals(): Set<Primitive>
method that returns the set of known literal values that will pass validation. For something likez.string()
, this will be undefined (there are infinitely many valid inputs). For something like a ZodEnum or ZodLiteral, this will be a finite set of values.z.union()
, Zod usesgetLiterals()
to extract a set of "fast checks" for each union element in the form{[k: string]: Set<Primitive>
.abortEarly
mode to bail out ASAP in the event of validation errors. This mode will speed up ZodUnion performance even if there's no discrimination to be done. This was already on the Zod 4 roadmap but is made even more relevant now.The API could also accept a "discriminator hint" to point the parser in the right direction.
In retrospect, this is what the API always should have been. Discriminated unions are not a type unto themselves, just an optimization over unions, and the API should have reflected that.