Is it possible to force a normalized output ?

lowlighter commented 4 months ago

Maybe this is not the most adequate tool, but I'm trying to use zodToJsonSchema like a "zod AST parser". The aim behind is to be able to transform process zod objects into dynamically created HTML forms, GraphQL typings, etc.

The only issue currently I have is that there is an "auto-flattening" in a lot of places, which makes it really awkward to do what I'm trying to achieve so it'd be nice to have an option to never flatten fields.

For example:

const schema = zod.object({
  foo: is.string()
  bar: is.string().nullable(),
  baz: is.number().int().nullable()
})

Current output:

{
  foo: { type: "string" },
  bar: { type: [ "string", "null" ] },
  baz: { anyOf: [ { type: "integer" }, { type: "null" } ] },
}

Desired output:

{
  foo: { anyOf: [ { type: "string" } ],
  bar: { anyOf: [ { type: "string" }, { type: "null" } ] },
  baz: { anyOf: [ { type: "integer" }, { type: "null" } ] },
}

Maybe this is out-of-scope of this package, but technically it remains valid json schema as the spec only requires anyOf to be non-empty, and while it's more "verbose" it makes it much easier to post-process the output as iterating over it is simpler

StefanTerdell commented 4 months ago

Hello!

Thank you for opening an issue. I've always intended for it to produce reasonably flat schemas to improve readability of the output. I don´t see a reason to change that.

Good luck with your project!

lowlighter commented 4 months ago

Ok fair enough. Thanks a lot for clarifying!

StefanTerdell commented 4 months ago

@lowlighter Actually your issue got me thinking. I've been toying with the idea to separate a "Zod schema walker" package for the next major of this tool (Zod 4 is around the corner). Essentially you´d have a list of callbacks for each distinct schema type and variant. Something like...

const finalCtx = walk(myZodSchema, {
    object: parseObject,
    array: parseArray,
    union: parseAnyOf,
    // ...etc
});

The signature of each function could be tied to the root zod type

type Callback<Ctx, Schema> = (node: Schema, ctx: Ctx, path: string[]) => Ctx 
type Callbacks<Ctx> = {
  object: Callback<Ctx, ZodObjectSchema>,
  array: Callback<Ctx, ZodArraySchema>,
  // ...etc
}

Whatever you're building would go in the generic ctx object which would eventually be returned. So for instance, for building JSON Schemas the ctx would reflect that:

const myJsonSchema = walk<JsonSchema7>(myZodSchema, {
    object: parseObjectIntoJsonSchema,
    // ...etc
});

WDYT?

lowlighter commented 4 months ago

Yes this is exactly the kind of API I was looking for !

However in your example I'm not sure to understand if :

zod.object({ foo: is.array(is.unknown()) })

Would call object and then array callback, or just the object one upon traversal.

In the others walkers I've worked with (like for css/xhtml), I feel like they usually only accepts a single callback and let you discriminate against the "token type", something similar to:

type ZodNode = 
  { type: "object", schema: ZodObjectSchema } | 
  { type: "array", schema: ZodArraySchema }
  // ...

type Callback = (node: ZodNode, ctx: Ctx, path: string[]) => Ctx

Because of this, you can handle typings with a switch (node.type) since TS is normally able to infer it back after a type check.

The advantage of only having a single callback is that if you're only working with fields that are common in all zod schemas or want to do same processing on several types of node, you can just pass a single and factorize some parts of your logic

But this is only based on my experience of using languages parsers, so maybe it'd wouldn't translate well for zod traversals

FWIW, I mostly achieved what I wanted to do in the initial post for the GraphQL converter by post-processing the output of zodToJsonSchema, and what I've mostly needed were these fields: description, type, items (for arrays), enum and const (for strings), full path. The most tedious part was just to "normalize" to always have a anyOf array to make walking easier.

Maybe the above can give you an idea about how this could be used or needed by others people.

I've not worked on the HTML form generator yet, but I assume I'll mostly need the same, along with additional metadata about constraints (like min/max/pattern, etc. that can be put on <input> tags).

The only thing I've not check was how zodToJsonSchema was handling extra properties, I've read in a PR on zod that was suggesting to add .example() and that it's currently possible to do something similar to:

z.string({
  description: 'An identifier for the resource.',
  example: '9c235211-6834-11ea-a78c-6feb38a34414',
})

But I'm not even sure if these are stored in the schema or not, though if that's the case, it'd be nice to have access to these in the walker if you work on it.

The aim behind is to be able to create ecosystems with cohesive typings, validation, documentation and user interactions, and this where zodToJsonSchema is great because it kind of act like a gateway between the codebase and human toolings

Anyways thanks a lot for checking this more deeply !

StefanTerdell commented 4 months ago

Thanks for the thorough reply. I'll try and remember to ping you if anything comes of it. Btw, you might want to check out Superforms. GL!

StefanTerdell / zod-to-json-schema

Is it possible to force a normalized output ? #128