colinhacks / zod

TypeScript-first schema validation with static type inference
https://zod.dev
MIT License
33.09k stars 1.15k forks source link

RFC: Including input data in errors #3619

Open colinhacks opened 2 months ago

colinhacks commented 2 months ago

RFC: Include input data in ZodIssue

Read the RFC πŸ‘‰

General comments can go below. Leave specific comments in the Files changed interface.

Acknowledgement

My full-time work on Zod 4, including the design & implementation of this proposal, is supported by Clerk.

clerk logo

ethanniser commented 2 months ago

the mutable global config seems potentially dangerous maybe im missing somehting but what is stopping some third party library who also depends on zod from calling z.configure that and setting inputInclude to always in production

colinhacks commented 2 months ago

the mutable global config seems potentially dangerous maybe im missing somehting but what is stopping some third party library who also depends on zod from calling z.configure that and setting inputInclude to always in production

It's a reasonable concern, but this is just a fact of life with npm. The same malicious library could override the parse method on the ZodType prototype and send all the data to a remote server πŸ€·β€β™‚οΈ The answer is always "vet your dependencies". Setting some global config is the least of the potential pitfalls here.


The only other configuration mechanism that comes to mind is some kind of "Zod factory" pattern:

import { Zod } from "zod";

const z = Zod.create({ /* ... */ })

Which I don't really like. For starters it's a breaking change for 100% of users. But more specifically, the z variable has also always been a module, and it includes type utilities like z.infer. There's no way to add z.infer (a type alias) on a runtime object that's created with Zod.create() (or equivalent). This would confuse a lot of people, I'd wager, and trying to explain this subtlety to casual TypeScript users doesn't sound fun.

KATT commented 2 months ago

I like this.

I agree with the mutable config concern - I think I'd prefer that z.configure() being a factory that returns a new "root instance". Otherwise, depending if the file containing it is imported or not, you get different behavior

igalklebanov commented 2 months ago

Hey πŸ‘‹

Love this! πŸš€

How are arrays handled? should all items be included, or just the bad items (+index)? Maybe for objects, we could define how this behaves per key (and subkey)? use case is, include inputs for everything but PII/secrets in production..

ethanniser commented 2 months ago

I see how its not really possible to have a better solution given zod's architecture

I don't think saying just "vet your dependencies" is unreasonable

this is a good change πŸ‘

colinhacks commented 2 months ago

How are arrays handled? should all items be included, or just the bad items (+index)?

If the issue originates from a ZodArray schema, it'll contain the whole array. If there's an issue with a particular element, then the index will be specified as the last element in path and input will point to the offending element. This logic already exists in the ZodArray parser.

Maybe for objects, we could define how this behaves per key (and subkey)? use case is, include inputs for everything but PII/secrets in production..

I hadn't considered this, but for now I'll just say its out of scope for this RFC. Something like this will probably be achievable with Zod 4's updated error map API (haven't tweeted this one out yet but it's public πŸ™ƒ) by directly mutating the issue inside the error map.

const inputDeleter: z.ZodErrorMap = (iss)=>{ delete iss.input };

z.object({
  public: z.string(),
  secret: z.string({ error: inputDeleter })
})
igalklebanov commented 2 months ago

If the issue originates from a ZodArray schema, it'll contain the whole array. If there's an issue with a particular element, then the index will be specified as the last element in path and input will point to the offending element. This logic already exists in the ZodArray parser.

Could we control it somehow (error map API?) not to contain the whole array? imagining big arrays being annoying in error logs/traces.

I hadn't considered this, but for now I'll just say its out of scope for this RFC. Something like this will probably be achievable with Zod 4's updated https://github.com/colinhacks/zod/pull/3618 API (haven't tweeted this one out yet but it's public πŸ™ƒ) by directly mutating the issue inside the error map.

Nice! wondering if a shortcut like .secret() will make sense if this pattern is used a lot.

sebastianwessel commented 2 months ago

Just my 2 cents:

Zod is widely adopted by many projects and companies, often chosen for legal reasons and privacy concerns. If the default behavior changes to include input data in errors, simply because someone forgot to explicitly set NODE_ENV to production, it could have a significant impact on many users.

Errors get logged, and logs are often stored for extended periods. In the worst-case scenario, sensitive information could inadvertently be exposed. The z.configure method for active opt-in is the right approach, and the feature to enable more verbose errors is indeed helpful.

However, the environment variable approach does not align with the general behavior of Zod and the reasons why companies and critical projects choose Zod in the first place.

colinhacks commented 2 months ago

imagining big arrays being annoying in error logs/traces.

Keep in mind there aren't many issues that would originate from ZodArray itselfβ€”it basically only throws an error if you pass in a non-Array, or if the length of the array is too big/small per your .min()/.max() checks. All other issues will be associated with a particular index and won't get the whole array in input.

Could we control it somehow (error map API?) not to contain the whole array?

You can delete input entirely using the same error map as before. I don't think Zod will provide a mechanism for customizing what value gets assigned to issue.input, that seems like overengineering.

If it turns out a lot of people are annoyed by this, we could do a few things:

But keep in mind console.log starts truncating after a certain depth anyway.

colinhacks commented 2 months ago

Zod is widely adopted by many projects and companies, often chosen for legal reasons and privacy concerns. If the default behavior changes to include input data in errors, simply because someone forgot to explicitly set NODE_ENV to production, it could have a significant impact on many users.

Its good to have a more conservative take in the discussion.

I hear what you're saying, but from a security perspective, the current proposal is not unreasonable in my view.

If you have serious security/auditing constraints for your application, the onus is on you to make your systems compliant, and I don't think setting NODE_ENV properly is an unreasonable expectation. I'm trying to avoid a situation where Zod's DX suffers for everyone for the sake of a hypothetical medical app that hasn't set NODE_ENV properly.

That said, I'm sympathetic to this concern. As you say, Zod is already widely adopted. Even if the NODE_ENV thing is reasonable in a vacuum, it may not be workable for Zod. No one reads changelogs to the end, and this change may be unexpected for someone who expected Zod to continue eliding input data from ZodError forever.

I think a decent middle ground is to set input if and only if:

This means input will exist for the 95+% of users who are using a framework that properly sets NODE_ENV=development (which is most of them). Zod will also set input in the browser, where the memory is already fully introspectable, and hiding input data is a bit of a moot point.

But it avoids the failure mode where an existing production service that hasn't set NODE_ENV at all suddenly starts seeing input data in their logs. cc @sebastianwessel

If there's a security-sensitive application that has explicitly set NODE_ENV=development in the production service, I can't say I'm wildly sympathetic to their plight.

kibertoad commented 2 months ago

@colinhacks Can this be a new parameter to a parse method instead?

capaj commented 2 months ago

@kibertoad agree with you, being able to set this on parse method would be nice, but I'd make it opt-out, not opt-in

kibertoad commented 2 months ago

@capaj Default value should be what is preferable for production.

BenLorantfy commented 2 months ago

@colinhacks Under this proposal, I think there would still be no way to get access to the "root" parsed object from a ZodError, correct?

I created this library called zod-error-viewer that can be used to render nicely formatted errors: https://github.com/BenLorantfy/zod-error-viewer

example

However, in order to format errors like this, we need to ask the consumer to provide the data that was parsed:

import { ZodErrorViewer } from "zod-error-viewer";

<ZodErrorViewer
  data={/* data that was parsed */}
  error={/* ZodError that occurred */}
/>;

This is pretty inconvenient for the consumer, because they need to keep around a reference to the data that was parsed. This can add extra complexity to the consumer's code.

It would be convenient for the consumer if they could just pass the error:

import { ZodErrorViewer } from "zod-error-viewer";

<ZodErrorViewer
  error={/* ZodError that occurred */}
/>;

However, this would require being able to access the originally parsed "root" data from the error object.

Have you considered exposing the "root" parsed object on the error? (In addition, or instead of, adding input to each ZodIssue as mentioned in the RFC). Something like this:

class ZodError {
  issues: ZodIssue[];
  input?: unknown;
}