davidmdm / myzod

MIT License
315 stars 20 forks source link

`allowUnkownKeys()` causes all `optional()` keys to be set to `undefined`? #42

Open joekrill opened 3 years ago

joekrill commented 3 years ago

Given:

const myschema = myzod.object({ foo: myzod.string().optional() });

myschema.parse({}); 

The result is an empty object ({ })

But if I add allowUnknownKeys():

const withUnknownKeys = myzod.object({ foo: myzod.string().optional() }).allowUnknownKeys();

withUnknownKeys.parse({});

This results in { foo: undefined }

This isn't expected behaviour, is it?

Further, shouldn't allowUnknownKeys() not strip out the unknown keys, and pass them through?

Maybe coming from zod I'm expecting the behavior to be similar?

davidmdm commented 3 years ago

Actually, upon diving back into the code to investigate, I have found the reasoning as to why this is the way it is.

Suppose this:

const schema = mz.object({ name: mz.string(), age: mz.number() })
schema.parse({ name: 'Bob', age: 42, password: 'secret' }) // throws

Here we expect it to throw because it not of the type we wanted. Technically it is a subtype, but one of the original aims of myzod was to be strict and enforcing. That way there is no situation (although extremely rare) where you are manipulating data that you did not intend to once parsed by a myzod schema.

The allowUnknown option is intended to not throw in the above case but is not meant to passthrough the unknown fields because myzod wants to return to you exactly the type you defined.

If you want to allow unknown keys and let them be passthrough in the schema you would do that via a keySignature:

const schema = mz.object({ name: mz.string(), age: mz.number(), [mz.keySignature]: mz.unknown() })
schema.parse({ name: 'Bob', age: 42, password: 'secret' }) // works and keeps field password

The difference is now that the type system lets you know you may have unkown fields and can help you when reducing or mapping over keys or the likes.

So it is indeed more type-safe than zod, and intentional.

In the first case:

const myschema = myzod.object({ foo: myzod.string().optional() });
myschema.parse({}); // returns {}

myzod is simply smart enough to know the schema does not need to modify its inputs and so returns the same object but typed. This is a small transgression against immutability but it is a comprise for an excellent omitimization for server side code.

Does this make sense?

davidmdm commented 3 years ago

And the mechanism is to coerce the object to the correct shape when allowUnknown is true and we need to strip keys, and when it coerces it adds the missing key to the object. A small side-effect, but still consistent from a type system perspective.

joekrill commented 3 years ago

If you want to allow unknown keys and let them be passthrough in the schema you would do that via a keySignature:

I tried this but it doesn't provide correct typing:

const schema = mz.object({ foo: mz.string(), [mz.keySignature]: mz.unknown() });
export type Schema = Infer<schema>;

results in Schema getting typed as:

type Schema = {
    [x: string]: unknown;
}

So all the keys are being clobbered. Maybe that's an entirely different bug?

myzod is simply smart enough to know the schema does not need to modify its inputs and so returns the same object but typed.

But that results in completely unexpected behavior, I think. I would never expect that just by using allowUnknownKeys() it would explicitly assign undefined to all of my optional key values.

Is there a workaround at least? This is a huge problem for me because in my use case (firebase, in this particular instance) there is a distinct difference between a value having an explicit undefined value, and one not being defined.

davidmdm commented 3 years ago

I have the correct type being inferred from the keysignature based schema:

{ foo: sting; [x: string]: unknown }

What is your tsconfig and typescript versions?

But that results in completely unexpected behavior, I think. I would never expect that just by using allowUnknownKeys() it would explicitly assign undefined to all of my optional key values.

You are correct that this behaviour is odd. I will try and commit a fix for this.

The work around is the keySignature. it is weird that it is not working for you.

joekrill commented 3 years ago

I tried various configs. Using the tsconfig.json straight from this repo's master branch I get:

Screen Shot 2021-05-19 at 2 44 36 PM

This is Typescript 4.2.4

Upon further investigation, it seems to depend on my import. Following the examples in the README.md and using:

import myzod, { Infer } from 'myzod';

I get the clobbered, unexpected type signature.

But if I import this way (as is being done in the specs):

import * as myzod from 'myzod';

I get the correct typing that you are seeing.

joekrill commented 3 years ago

Actually it looks like it ultimately comes down to the keySignature usage. That myzod.keySignature usage is the problem:

Screen Shot 2021-05-19 at 2 57 03 PM Screen Shot 2021-05-19 at 2 56 43 PM
davidmdm commented 3 years ago

that's very odd because if I add this test it passed:

import myzod from '../src'
import * as z from '../src'

it('default keysignature should be the same as named keysignature', () => {
  assert.deepStrictEqual(myzod.keySignature, z.keySignature); 
});

but you seem to be correct in the behaviour...

davidmdm commented 3 years ago

I just published v1.7.4 which I believe should fix the type error, I will test more carefully later when I have more time. but just in case it works properly mentioning it here

joekrill commented 3 years ago

that's very odd because if I add this test it passed:

It is very strange. I guess you'd have to actually add a check for the types produced to verify this sort of thing.

Not sure if it's related, but I see an error in VS code when using keySignature in some cases:

Exported variable 'myzodSchema' has or is using name 'keySignature' from external module "" but cannot be named.ts(4023)

But checking through tsc using the command line does not produce that error, so I've been ignoring it (I even checked to make sure VS code is using the local typescript package installed).

davidmdm commented 3 years ago

it's actually not strange in the end because that test is testing the equality of those values in JS world. It was the type that was wrong. The default export:

export default {
  keySignature // Symbol
}

Lost the type information associated to the uniqueness of the symbol and just had Symbol as the type, while the named export exported the actual variable and had the type be typeof keySignature.

Typescript is the gift that keeps on giving.

I consider this solution to be not the workaround but the proper way to accept unknown keys in myzod; by typing them as unknown.

Please let me know if this fixes your short term problem.

I will investigate not setting optional keys for allowUnknown schemas because you are right it is an unexpected side effect, however it will incur a performance hit and depending on the result I may choose to fix or not fix it.

Thanks for reporting this and being patient!

joekrill commented 3 years ago

This workaround works great, thanks. Thanks for being so responsive!

joekrill commented 3 years ago

I just wanted to update this with some additional, similar behavior I'm seeing, which I'm pretty sure is related.

First, using coerce() or map() on any property within an object causes the same behavior as using allowUnknownKeys() on the object itself. So for example, the output of each one of these parse calls is an object containing { c: undefined }:

const myzod = require("myzod");
const data = { id: 1, a: "apple", b: "banana" };

myzod.object({ id: myzod.number(), c: myzod.string().optional() }).allowUnknownKeys().parse(data);
// { id: 1, c: undefined }

myzod.object({ id: myzod.number().coerce(), c: myzod.string().optional(), [myzod.keySignature]: myzod.unknown() }).parse(data);
// { id: 1, a: 'apple', b: 'banana', c: undefined }

// It doesn't matter what map does (even if it does nothing) for this occur 
myzod.object({ id: myzod.number().map(a => a), c: myzod.string().optional(), [myzod.keySignature]: myzod.unknown()  }).parse(data);
// { id: 1, a: 'apple', b: 'banana', c: undefined }

I think you may have eluded to this behavior in a previous comment in regards to coerce() - but I wasn't expecting it with map(). Although in hindsight I'm sure they use the same mechanism under-the-hood.

But further -- and this is where it's a little "weird" -- any "parent schemas" are also affected by this undefined behavior -- but any "child schemas" are not.

This requires a bit more complex data structure to illustrate, so I threw together a set of random test data using Mockaroo:

[
  {
    "id": 1,
    "lastName": "Bynert",
    "gender": "Polygender",
    "ipAddress": "15.57.73.220"
  },
  {
    "id": 3,
    "otherId": 92,
    "firstName": "Marcelle",
    "lastName": "Ebbett",
    "email": "mebbett2@uiuc.edu",
    "gender": "Female",
    "ipAddress": "5.73.241.192",
    "job": {
      "title": "Desktop Support Technician",
      "department": "Business Development",
      "company": {
        "id": "1024921263",
        "name": "Livetube"
      }
    }
  },
  {
    "id": 6,
    "firstName": "Percy",
    "lastName": "Hadye",
    "gender": "Polygender",
    "ipAddress": "187.142.139.82",
    "job": {
      "title": "Account Executive",
      "department": "Marketing",
      "company": {
        "id": "8633553945",
        "name": "Lajo"
      }
    }
  },
  {
    "id": 7,
    "firstName": "Damian",
    "lastName": "Herrema",
    "email": "dherrema6@paypal.com",
    "gender": "Female",
    "ipAddress": "101.113.242.176",
    "job": {
      "title": "Pharmacist",
      "company": {
        "id": "3772053157",
        "name": "Fadeo"
      }
    }
  },
  {
    "id": 8,
    "lastName": "Trunchion",
    "gender": "Male"
  },
  {
    "id": 10,
    "otherId": 68,
    "firstName": "Masha",
    "lastName": "Amori",
    "gender": "Agender",
    "ipAddress": "40.238.59.24",
    "job": {
      "title": "VP Marketing",
      "department": "Training",
      "company": {
        "id": "736441389",
        "name": "Fanoodle",
        "employees": 30
      }
    }
  }
]

And started with the following code:

import myzod from "myzod";
import data from "./data.json";

const companySchema = myzod.object({
  id: myzod.string().optional(),
  name: myzod.string().optional(),
  employees: myzod.number().optional(),
});

const jobSchema = myzod.object({
  title: myzod.string().optional(),
  department: myzod.string().optional(),
  company: companySchema.optional(),
});

const personSchema = myzod.object({
  id: myzod.number(),
  otherId: myzod.number().optional(),
  firstName: myzod.string().optional(),
  lastName: myzod.string(),
  email: myzod.string().optional(),
  gender: myzod.string(),
  ipAddress: myzod.string().optional(),
  job: jobSchema.optional(),
});

const dataSchema = myzod.array(personSchema);

const result = dataSchema.parse(data);

console.dir(result, { depth: null, colors: true });

As it stands, this all works "as expected" -- no undefined values get created. For example, the result for person with ID 7 is this:

{
    id: 7,
    firstName: 'Damian',
    lastName: 'Herrema',
    email: 'dherrema6@paypal.com',
    gender: 'Female',
    ipAddress: '101.113.242.176',
    job: {
      title: 'Pharmacist',
      company: { id: '3772053157', name: 'Fadeo' }
    }
}

But if jobSchema uses allowUnknownKeys() (or it includes a coerce() or map() call on one of its properties) all missing, optional values that occur within jobSchema get set to undefined as expected -- but also any missing properties from personSchema (it's "parent schema") also get set to undefined. Properties within companySchema (it's "child schema") are unaffected. So for the record with ID 7 we end up with:

{
    id: 7,
    otherId: undefined,
    firstName: 'Damian',
    lastName: 'Herrema',
    email: 'dherrema6@paypal.com',
    gender: 'Female',
    ipAddress: '101.113.242.176',
    job: {
      title: 'Pharmacist',
      department: undefined,
      company: { id: '3772053157', name: 'Fadeo' }
    }
}

Note that job.department gets set to undefined, but job.company.employees does not. But otherId is also set to undefined (this is the really unexpected part, for me).

Moving down the tree: if you add a coerce() to the companySchema's employee property instead, then everything gets set to undefined if it doesn't exist, no matter where it was specified.

{
    id: 7,
    otherId: undefined,
    firstName: 'Damian',
    lastName: 'Herrema',
    email: 'dherrema6@paypal.com',
    gender: 'Female',
    ipAddress: '101.113.242.176',
    job: {
      title: 'Pharmacist',
      department: undefined,
      company: { id: '3772053157', name: 'Fadeo', employees: undefined }
    }
}

To further illustrate that this only affects the schema and it's parent schemas, but not child schemas: f coerce() is only added to the personSchema's otherId property (and there are no other uses of coerce(), map(), or allowUnknownKeys()), then only properties directly on the "person" object itself get created with undefined values:

 {
    id: 7,
    otherId: undefined,
    firstName: 'Damian',
    lastName: 'Herrema',
    email: 'dherrema6@paypal.com',
    gender: 'Female',
    ipAddress: '101.113.242.176',
    job: {
      title: 'Pharmacist',
      company: { id: '3772053157', name: 'Fadeo' }
    }
}

So the additional unexpected behavior is that this affects "parent schemas". And also that map() and coerce() trigger this, as well.

davidmdm commented 3 years ago

So, yeah. Certain types or schemas are coercable. A mapped type is by definition coercing a type to another type. Or at least performing some transformation on the data within the same type. When a parent type includes a type that is coerceable than the parent also is coerceable.

Suppose:

const schema = mz.object({ age: mz.number().coerce() })
const result = schema.parse({ age: "23" }) // { age: 23 }

A coercion from types { age: string } to { age: number } happened between the input and the result of parsing.

So when a parent has a child schema that is coerceable by any means it sets itself as coerceable. How it does not set all of it's child schemas as coerce-able if it does not have to.

It does lead to this inconsistency of sorts about when exactly the undefined key will be added to the result. The benefit to this approach is that when it can myzod does not need to assign new objects and duplicate all key/values. Thus faster parsing.

So in the end none of it was necessarily designed on purpose to have this inconsistency but as a consequence of other priorities.

Mentally as you have figured out you can know when myzod will add the key to the result by determining if the type is a "coerceable".

My position is that regardless of if the key is not present or is present but set to undefined, it still respects the type-system.

I am keeping up this issue for documentation purposes, but you are convincing me that I may need to add a warning about this to the readme and a small section on using keySignatures if you need to preserve the absence of certain keys.