fabian-hiller / valibot

The modular and type safe schema library for validating structural data πŸ€–
https://valibot.dev
MIT License
6.33k stars 204 forks source link

Initialize @valibot/to-json-schema package #802

Closed gcornut closed 2 months ago

gcornut commented 2 months ago

Following the discussions on https://github.com/gcornut/valibot-json-schema/issues/84, here is my proposal on a new package for converting Valibot to JSON schema to be moved here next to Valibot core library code :)

The setup

  1. New packages/to-json-schema NPM package named @valibot/to-json-schema
  2. Some basic schema & validation conversions
    • any schema
    • null schema
    • string schema
    • object schema
    • email validation
  3. Lint, Types and Tests (100% coverage)

I tried to match the code style of the core Valibot library.

What is next ?

I'll deprecate and archive @gcornut/valibot-json-schema once this lib gets to feature parity. Although I will keep the CLI in a new repo because it's a use case I still need.

fabian-hiller commented 2 months ago

Thank you! I will probably give you feedback on everything by tomorrow.

fabian-hiller commented 2 months ago

Thanks again for the PR. I have reviewed everything and also investigated several alternative implementations. I even tried a fully typesafe implementation where the output type was the actual JSON schema output. Unfortunately, the typesafe implementation got quite complex and probably too hard to maintain at this point. Since Valibot is not a JSON-schema-first library, I don't expect many users to expect such an implementation. However, I have added an alternative implementation to /src2 with a focus on simplicity, bundle size and performance. Can you take a look at it and let me know what you think? Once we have decided on the basic structure of the source code, we can continue to add support for more schemas and actions.

gcornut commented 2 months ago

@fabian-hiller You alternative implementation looks good πŸ‘ I can add more schemas and actions conversion and clean up the code if you want

fabian-hiller commented 2 months ago

Thank you! Would you change anything? Otherwise, feel free to clean up the code and add support for more schemas and actions after we answer my next question.

I saw that TypeBox's Type.Undefined() returns { type: 'undefined' }, but this does not seem to be supported by JSON Schema Draft 7 (according to the JSONSchema7 type). Should we upgrade to a newer one? Should we even specify the version we are using? Are you aware of any breaking changes in newer versions of JSON Schema that would result in different output?

fabian-hiller commented 2 months ago

We should also check the integration with OpenAI and Vercel AI (related Issue) before extending our implementation.

gcornut commented 2 months ago

saw that TypeBox's Type.Undefined() returns { type: 'undefined' }, but this does not seem to be supported by JSON Schema Draft 7 (according to the JSONSchema7 type)

I don't see how JSON schema could support undefined since JSON does not support undefined (only null) 🀷

Should we upgrade to a newer one? Should we even specify the version we are using? Are you aware of any breaking changes in newer versions of JSON Schema that would result in different output?

Yes their are newer versions of the JSON schema specs (which some major changes 😬) but most tools use the draft-07 version (OpenAI, Vercel AI, etc.)

And actually if we explicitly want to support OpenAI structured output we would need to have a subset of JSON schema since it does not support every feature. For Vercel AI I can't see document that suggest they don't support all JSON schema features 🀷 (they use the same TypeScript JSONSchema7 type as we do).

We could add an option target?: 'json-schema-draft-07' | 'openai-structured-outputs' that could change the output and fail on unsupported feature (with still the possibility of forcing the conversion).

gcornut commented 2 months ago

Some people JSON schema only for documentation purpose so maybe for them we could add another output "target" that could output things like type: 'undefined' which are unusable to validation a JSON payload but maybe useful for documentation.

fabian-hiller commented 2 months ago

Thanks for your detailed answer. For now, I think it is best to optimize everything for JSON Schema Draft 7 and thread everything else as not supported (force can still be used). In the long run, we will evaluate how people use this package and change or add functionality based on that. Feel free to clean up the code and add support for more schemas and actions. You can always reach out if you have any Valibot related questions or if you think I can help you in any other way. I am very grateful for your contribution!

gcornut commented 2 months ago

How strict do you think this lib should be on the input schema ?

I was thinking that the default value in optional/null/nullish schemas and the value in the literal schema should only be JSON compatible values. And actually these values needs to be strictly compatible, excluding types like Date (that does serialize to JSON but as a string) or Infinity (that serializes to null in JSON).

Maybe the actions we convert should check on which type they are applied (email is only applicable to string, integer on number, etc.) ? Valibot does mostly guard against action/schema missmatch, but only via TS type ?

It's probably overkill but we could validate the input schema via a valibot meta schema πŸ˜„

fabian-hiller commented 2 months ago

I was thinking that the default value in optional/null/nullish schemas and the value in the literal schema should only be JSON compatible values...

I think that toJsonSchema will throw anyway if the wrapped schema is not JSON schema compatible, or am I wrong? If I'm right, I think we don't need to worry about the type of the default value. Infinity and NaN are probably the only exceptions, but it is a very very rare case.

Maybe the actions we convert should check on which type they are applied...

I think we should not worry about that. Our API is meant to be used in a certain way, and we should not have to deal with people using it incorrectly.

pkg-pr-new[bot] commented 2 months ago

Open in Stackblitz

pnpm add https://pkg.pr.new/valibot@802

commit: 87f419c

gcornut commented 2 months ago

Alright so I've rebased from the main branch. I've added almost all possible schema & action conversions.
If it's too big, I can split the PR. I already tried to split this in commits.

I need to add optional conversions for date and bigint since they are not technically supported in JSON but can be converted to number or string depending on the use case.

fabian-hiller commented 2 months ago

Thank you so much! It is ok for me if this PR is "big". I will try to review everything in the next days with the goal to release this package next week.

gcornut commented 2 months ago

Ok I've added optional date and bigint schema conversion (requiring "strategy" configs)

fabian-hiller commented 2 months ago

I am not sure if we should add the strategy configs from the beginning. I would prefer to wait until people come up with good use cases and then decide on the implementation. But feel free to share your opinion since your conversion library has a longer history.

gcornut commented 2 months ago

Yes it might be a bit overkill.

I know sveltekit-superforms use date schema conversion to integer (https://github.com/ciscoheat/sveltekit-superforms/blob/main/src/lib/adapters/valibot.ts#L30). But I think they use JSON schema for a documentation/metadata use case. They don't need a strict schema but just a way to describes schema no matter the implementation (zod, joi, yup, valibot, etc.)

fabian-hiller commented 2 months ago

Then I would recommend removing it and keeping the implementation as simple as possible until people request such features.

fabian-hiller commented 2 months ago

Instead of offering such configurations, in the long run we could offer a general way to modify the output of toJsonSchema, similar to how it works with JSON.stringify.

gcornut commented 2 months ago

If you are interested, please feel free to add your name to LICENSE.md.

Thanks but no need to do that πŸ™‚

Next, I will refactor a few code lines and then check the implementation and tests in detail.

Ok great πŸ‘

fabian-hiller commented 2 months ago

I started refactoring some parts and will commit my changes later. We will probably have a few implementation interactions in the next days before releasing the first version of the package next week.

fabian-hiller commented 2 months ago

I refactored the code and made the following bigger changes:

Definitions: I removed the definitions property from config as I think we can create them automatically as needed. Basically, whenever convertSchema sees a lazy schema that is not already part of the definitions, it will add it with a generated reference counting up from 0. If users want to explicitly define the definitions and references, we can add that later as needed.

Assert JSON: I replaced or removed the assertJSON utility because I could not find a case where a correct use of Valibot could actually lead to an invalid case. I may be wrong, so feel free to add comments and discuss this with me.

Sorry for partially overwriting or changing your code without asking, but I think this will help us ship the first version of this package faster. Please let me know if you want to revert or change any of my modifications.

I did not touch the tests. We can do that after we agree on the final implementation.

gcornut commented 2 months ago

Definitions: I removed the definitions property from config as I think we can create them automatically as needed. Basically, whenever convertSchema sees a lazy schema that is not already part of the definitions, it will add it with a generated reference counting up from 0. If users want to explicitly define the definitions and references, we can add that later as needed.

Named definitions seems an essential feature to me. I know I would have to hack it back into my scripts that exports my valibot schemas into JSON schema.

But I won't be blocking just for that if you want to proceed πŸ‘

fabian-hiller commented 2 months ago

I agree that this could be an essential feature. Therefore, I added support for explicitly specified definitions with my last commit.

fabian-hiller commented 2 months ago

Should I continue with the final review and unit tests, or do you have more feedback on the current implementation?

gcornut commented 2 months ago

Should I continue with the final review and unit tests, or do you have more feedback on the current implementation?

All good for me πŸ‘

fabian-hiller commented 2 months ago

I think that I will merge this PR later (I am not done yet) and ship our v0.1.0 release πŸŽ‰

fabian-hiller commented 2 months ago

@gcornut do you think we should completely ignore unsupported schemas and actions, or is it better to try to get it right? A concrete example is minLength. Should we add nothing to the output (and loose the information) if the schema type is not supported or should we add both minLength and minItems to the JSON Schema output?

fabian-hiller commented 2 months ago

If we choose the second strategy, we might consider removing the force configuration and just console.warn when something is not officially supported.

gcornut commented 2 months ago

Both solutions sound ok to me as long as the conversion is strict by default.

Like mentioned earlier there is definitely some people who just want to output schemas for documentation purpose and don't care about having 100% correct JSON schema definitions.

But maybe we could look into a "loose" output format later

fabian-hiller commented 2 months ago

What do you think about removing force and only console.warn in such cases?

gcornut commented 2 months ago

What do you think about removing force and only console.warn in such cases?

Sounds ok. If some user need to be noticed that the conversion was not completely successful we can think of adding another option later πŸ‘

fabian-hiller commented 2 months ago

I think I will keep force and throw when it is false or undefined and console.warn otherwise. I will push my changes soon

fabian-hiller commented 2 months ago

Thanks so much @gcornut for your contribution! We are ready to merge! 🍾

fabian-hiller commented 2 months ago

v0.1.0 is available