arktypeio / arktype

TypeScript's 1:1 validator, optimized from editor to runtime
MIT License
3.59k stars 46 forks source link

Support two-way serialization/deserialization #909

Open stephenh opened 4 months ago

stephenh commented 4 months ago

🤷 Motivation

We have types like JS Date or an internal LocalDate (only calendar dates, no times) that we want to encode/decode to JSON (or a string, like in a query string).

Libraries like Zod/Arktype/etc generally support deserialization, i.e. here's an unknown blob object literal, parse/safeParse it into my structure, and also transform the dueDate: "2024-01-01" key into --> dueDate: new LocalDate(...) during the process (zod transformers).

But they don't have serialization, which is if I have an object { dueDate: someJsDate }, and I want to serialize it, I want it to come out as dueDate: "2024-01-01" and not dueDate: Date.toString() or Date.toJSON() which is some really long ISO string.

Ideally I want to have a two-way codec, that for a given key, can do both custom serialization & deserialization.

Why should we prioritize solving it?

Because you're a new market entrant and hopefully will see this as an edge/feature to help drive your adoption. :-) :crossed_fingers:

💡 Solution

How do you think we should solve the problem?

Ideally with something similar to Zod's transformers, but more of a "codec" or "serde" that does both serialization & deserialization.

This would mean new high-level methods like serialize or format (opposite of parse).

Why do you think this is the best solution?

This sort of approach keeps the schema in control of both serialization & deserialization, instead of being at the whims of whatever .toString / .toJSON behavior the values happen to use.

Did you consider any alternatives?

I've been looking for a runtime type library (Zod, see this comment, others) and the only one that does codecs/serde afaict is io-ts, which comes with too much baggage.

Honestly I understand if you consider this out-of-scope, but I'm just really surprised that so far basically all of the runtime type libraries have somewhat myopically focused on "only parse" and forgotten the other half of serde.

ssalbdivad commented 4 months ago

This is a really interesting topic and one I've thought a lot about but am still not quite sure of my stance on.

When I was initially implementing morphs, I considered many solutions including a unique input/output that could be associated with a type. Especially working in the type system, I eventually determined that fundamentally, there was nothing special about input/output or serialize/deserialize compared to any other arbitrarily named transformations you might create for a type. Essentially, if you think about your types as a graph, really what you're defining is new edges (morphs) from the your vertex (type) to some other type (in this case string).

I do think this is a very powerful concept (I created an empty issue a long time ago to follow up, but it seems to me bidirectional codecs are more an application of this larger morph-based graph than inherently useful in and of themselves, and in fact I've seen a lot of questions about why in @effect/schema codecs must always be bidirectional when often it doesn't reflect a valid transformation in one direction, so you end up creating an implementation that just throws which feels very clunky to me.

That said, if you haven't seen already it's a great library and might be just what you're looking for!

If you're interested in fleshing out some of what I described re: morphs I'd definitely be interested in working together to tackle that, especially after the 2.0 release.

stephenh commented 3 months ago

Hello! Apologies from the really long delay in replying; your response made a lot of sense, but I was like "ah wow, 'graph of types'?! I'm going to really need to think about my reply...", and then well, finally got a chance to think/play with it today. :-)

there was nothing special about input/output or serialize/deserialize compared to any other arbitrarily named transformations you might create for a type.

I can see this being true...

if you think about your types as a graph, really what you're defining is new edges (morphs) from the your vertex (type) to some other type (in this case string)

I think I see what you mean; I hadn't actually tried morphs yet, so just now sketched out a "User type to JSON, JSON back to User type" prototype here:

And, yeah, it looks like it works if I treat each "version of type" (User-as-json and User-as-pojo) as its own thing.

Which makes sense, that for Arktype's current API there is only ever "input type --> output type", so to go the other way around, it makes sense to just flip the "output type --> input type".

I think what's less than ideal, at least for this very specific use-case (storing a POJO as JSON, i.e. storing POJOs into jsonb columns), is the duplication in defining User twice.

Is there a way to avoid that? Ideally I'd like to just define the User on its own, and have that definition know how to do In -> Out as well as Out -> In...

Which, per your link:

Oh nice! I knew of Effect, but have historically shied away from it b/c of naively assuming it would to too-monad-y; I wrote Scala for ~3-4 years and really enjoyed it, but personally more so from the "better Java" angle than the "getting Haskell on the JVM" angle. :-)

But they definitely have the type + encode + decode setup that I'm looking for, so I'll take a look!

If you're interested in fleshing out some of what I described

I'm definitely happy to chat over use cases and potential APIs!

At first I was assuming that supporting the codec pattern (being able to call encode) would be a pretty large/breaking change to the Arktype API, because right now when I do userAsPojo(...json...) it didn't seem clear "if that's (implicitly) decode, where would the encode method even go?"

But maybe something like:

const userDecoded = user.decode(json);
const userEncoded = user.encode(userDecoded);

Would work, and user(...) stays as the common-case API/syntax-sugar for decoding.

Granted, morphs would need to learn to (optionally--maybe by default the encodeFn is just value?.toJSON()) be two-way (accept an encodeFn), maybe something like:

const user = type({
  firstName: "string",
  birthday: [LocalDate, "|>", decodeFn, encodeFn],

I know I flipped the order of "|>" there (I put LocalDate first instead of string)...

At first that was an unintentional mistake, but actually I like it b/c imo it better communicates the "User.birthday is going to be a LocalDate" intent, and pushes the "here's how its encoded/decoded" to later in the tuple (maybe "two-way morphs" / codecs would have a different operator than |> ... <|> maybe?).

Vs. the current API of birthday: ["string", ...] almost makes it sound like birthday: string is what User.birthday will be typed as, until you see/realize that the decodeFn tuple argument swaps it over to LocalDate.

Anyway, that's my...mumble...months later thoughts! Thanks!

ssalbdivad commented 2 months ago

My intuition is that I'd rather have some way to define a group of "variants" of a type along with a syntax that allows you to conveniently transform between them, but I suppose that API could be defined in such a way that encode and decode could be defined and used in a way similar to what you describe.

I definitely want to revisit this after the next release once the core type system is stable.