Closed gcornut closed 2 months ago
Thank you! I will probably give you feedback on everything by tomorrow.
Thanks again for the PR. I have reviewed everything and also investigated several alternative implementations. I even tried a fully typesafe implementation where the output type was the actual JSON schema output. Unfortunately, the typesafe implementation got quite complex and probably too hard to maintain at this point. Since Valibot is not a JSON-schema-first library, I don't expect many users to expect such an implementation. However, I have added an alternative implementation to /src2
with a focus on simplicity, bundle size and performance. Can you take a look at it and let me know what you think? Once we have decided on the basic structure of the source code, we can continue to add support for more schemas and actions.
@fabian-hiller You alternative implementation looks good π I can add more schemas and actions conversion and clean up the code if you want
Thank you! Would you change anything? Otherwise, feel free to clean up the code and add support for more schemas and actions after we answer my next question.
I saw that TypeBox's Type.Undefined()
returns { type: 'undefined' }
, but this does not seem to be supported by JSON Schema Draft 7 (according to the JSONSchema7
type). Should we upgrade to a newer one? Should we even specify the version we are using? Are you aware of any breaking changes in newer versions of JSON Schema that would result in different output?
We should also check the integration with OpenAI and Vercel AI (related Issue) before extending our implementation.
saw that TypeBox's Type.Undefined() returns { type: 'undefined' }, but this does not seem to be supported by JSON Schema Draft 7 (according to the JSONSchema7 type)
I don't see how JSON schema could support undefined
since JSON does not support undefined
(only null
) π€·
Should we upgrade to a newer one? Should we even specify the version we are using? Are you aware of any breaking changes in newer versions of JSON Schema that would result in different output?
Yes their are newer versions of the JSON schema specs (which some major changes π¬) but most tools use the draft-07 version (OpenAI, Vercel AI, etc.)
And actually if we explicitly want to support OpenAI structured output we would need to have a subset of JSON schema since it does not support every feature. For Vercel AI I can't see document that suggest they don't support all JSON schema features π€· (they use the same TypeScript JSONSchema7 type as we do).
We could add an option target?: 'json-schema-draft-07' | 'openai-structured-outputs'
that could change the output and fail on unsupported feature (with still the possibility of forcing the conversion).
Some people JSON schema only for documentation purpose so maybe for them we could add another output "target" that could output things like type: 'undefined'
which are unusable to validation a JSON payload but maybe useful for documentation.
Thanks for your detailed answer. For now, I think it is best to optimize everything for JSON Schema Draft 7 and thread everything else as not supported (force
can still be used). In the long run, we will evaluate how people use this package and change or add functionality based on that. Feel free to clean up the code and add support for more schemas and actions. You can always reach out if you have any Valibot related questions or if you think I can help you in any other way. I am very grateful for your contribution!
How strict do you think this lib should be on the input schema ?
I was thinking that the default value in optional/null/nullish schemas and the value in the literal schema should only be JSON compatible values. And actually these values needs to be strictly compatible, excluding types like Date
(that does serialize to JSON but as a string) or Infinity
(that serializes to null
in JSON).
Maybe the actions we convert should check on which type they are applied (email
is only applicable to string
, integer
on number
, etc.) ? Valibot does mostly guard against action/schema missmatch, but only via TS type ?
It's probably overkill but we could validate the input schema via a valibot meta schema π
I was thinking that the default value in optional/null/nullish schemas and the value in the literal schema should only be JSON compatible values...
I think that toJsonSchema
will throw anyway if the wrapped schema is not JSON schema compatible, or am I wrong? If I'm right, I think we don't need to worry about the type of the default value. Infinity
and NaN
are probably the only exceptions, but it is a very very rare case.
Maybe the actions we convert should check on which type they are applied...
I think we should not worry about that. Our API is meant to be used in a certain way, and we should not have to deal with people using it incorrectly.
Alright so I've rebased from the main branch. I've added almost all possible schema & action conversions.
If it's too big, I can split the PR. I already tried to split this in commits.
I need to add optional conversions for date and bigint since they are not technically supported in JSON but can be converted to number or string depending on the use case.
Thank you so much! It is ok for me if this PR is "big". I will try to review everything in the next days with the goal to release this package next week.
Ok I've added optional date and bigint schema conversion (requiring "strategy" configs)
I am not sure if we should add the strategy configs from the beginning. I would prefer to wait until people come up with good use cases and then decide on the implementation. But feel free to share your opinion since your conversion library has a longer history.
Yes it might be a bit overkill.
I know sveltekit-superforms use date schema conversion to integer (https://github.com/ciscoheat/sveltekit-superforms/blob/main/src/lib/adapters/valibot.ts#L30). But I think they use JSON schema for a documentation/metadata use case. They don't need a strict schema but just a way to describes schema no matter the implementation (zod, joi, yup, valibot, etc.)
Then I would recommend removing it and keeping the implementation as simple as possible until people request such features.
Instead of offering such configurations, in the long run we could offer a general way to modify the output of toJsonSchema
, similar to how it works with JSON.stringify
.
If you are interested, please feel free to add your name to LICENSE.md.
Thanks but no need to do that π
Next, I will refactor a few code lines and then check the implementation and tests in detail.
Ok great π
I started refactoring some parts and will commit my changes later. We will probably have a few implementation interactions in the next days before releasing the first version of the package next week.
I refactored the code and made the following bigger changes:
Definitions: I removed the definitions
property from config
as I think we can create them automatically as needed. Basically, whenever convertSchema
sees a lazy schema that is not already part of the definitions, it will add it with a generated reference counting up from 0. If users want to explicitly define the definitions and references, we can add that later as needed.
Assert JSON: I replaced or removed the assertJSON
utility because I could not find a case where a correct use of Valibot could actually lead to an invalid case. I may be wrong, so feel free to add comments and discuss this with me.
Sorry for partially overwriting or changing your code without asking, but I think this will help us ship the first version of this package faster. Please let me know if you want to revert or change any of my modifications.
I did not touch the tests. We can do that after we agree on the final implementation.
Definitions: I removed the
definitions
property fromconfig
as I think we can create them automatically as needed. Basically, wheneverconvertSchema
sees a lazy schema that is not already part of the definitions, it will add it with a generated reference counting up from 0. If users want to explicitly define the definitions and references, we can add that later as needed.
Named definitions seems an essential feature to me. I know I would have to hack it back into my scripts that exports my valibot schemas into JSON schema.
But I won't be blocking just for that if you want to proceed π
I agree that this could be an essential feature. Therefore, I added support for explicitly specified definitions with my last commit.
Should I continue with the final review and unit tests, or do you have more feedback on the current implementation?
Should I continue with the final review and unit tests, or do you have more feedback on the current implementation?
All good for me π
I think that I will merge this PR later (I am not done yet) and ship our v0.1.0 release π
@gcornut do you think we should completely ignore unsupported schemas and actions, or is it better to try to get it right? A concrete example is minLength
. Should we add nothing to the output (and loose the information) if the schema type is not supported or should we add both minLength
and minItems
to the JSON Schema output?
If we choose the second strategy, we might consider removing the force
configuration and just console.warn
when something is not officially supported.
Both solutions sound ok to me as long as the conversion is strict by default.
Like mentioned earlier there is definitely some people who just want to output schemas for documentation purpose and don't care about having 100% correct JSON schema definitions.
But maybe we could look into a "loose" output format later
What do you think about removing force
and only console.warn
in such cases?
What do you think about removing
force
and onlyconsole.warn
in such cases?
Sounds ok. If some user need to be noticed that the conversion was not completely successful we can think of adding another option later π
I think I will keep force
and throw
when it is false
or undefined
and console.warn
otherwise. I will push my changes soon
Thanks so much @gcornut for your contribution! We are ready to merge! πΎ
v0.1.0 is available
Following the discussions on https://github.com/gcornut/valibot-json-schema/issues/84, here is my proposal on a new package for converting Valibot to JSON schema to be moved here next to Valibot core library code :)
The setup
@valibot/to-json-schema
any
schemanull
schemastring
schemaobject
schemaemail
validationI tried to match the code style of the core Valibot library.
What is next ?
@gcornut/valibot-json-schema
Date
intoformat: 'date-time'
orformat: 'unix-time'
,BigInt
intoformat: 'int64'
or string, explicitundefined
intonull
, etc.)I'll deprecate and archive
@gcornut/valibot-json-schema
once this lib gets to feature parity. Although I will keep the CLI in a new repo because it's a use case I still need.