Decoding JSON fields with multiple value types

azuzunaga commented 3 years ago

I'm trying to decode a JSON field that can have values of Int or String. Unfortunately I don't have a way to change the serialization. Following the docs it looks like the approach for things like this is creating a sum type and then using variants to decode/encode.

However, I've been having trouble getting the right result. I've tried using variantMatch and variantCase, but both expect a tagged object.

Here is a minimally reproducible example using variantCase. What is the best approach to handle JSON fields with mixed value types?

Thanks!

JordanMartinez commented 3 years ago

Does this work via either int string? either

garyb commented 3 years ago

In this scenario you need to write a "custom" codec - by which I mean one that isn't built entirely out of library-provided primitives, and accordingly you become responsible for ensuring that it round-trips correctly.

All of the library provided stuff that deals with sums uses tagging of some form, as without it there's no way to guarantee that the various branches don't overlap - for instance if you had a codec that specifically decoded the string "value", and then another codec that accepts any string value, depending on how you construct the codec this won't reliably do what you want, so the library can't provide anything out of the box that does this safely.

Here's one way you could write this particular codec though:

import Data.Codec as C

secondKeyCodec ∷ CA.JsonCodec SecondKey
secondKeyCodec = C.basicCodec decode encode
  where
    decode ∷ J.Json → Either CA.JsonDecodeError SecondKey
    decode j =
      lmap (const (CA.TypeMismatch "SecondKey"))
        $ Str <$> CA.decode CA.string j
        <|> Int <$> CA.decode CA.int j
    encode ∷ SecondKey → J.Json
    encode = case _ of
      Str s → CA.encode CA.string s
      Int i → CA.encode CA.int i

I included type signatures for encode / decode so you can see that there's plenty of ways these functions could be implemented - I used codecs again to encode/decode within them, but you could also just use the raw argonaut core functions like toString / fromString / toInt / fromInt or whatever you want in there. 🙂

This type of scenario - an untagged value that requires a sum type representation - is basically the worse case for this library, it sucks to deal with, but is a very common real world scenario when you're not in control of the serialisation format. I should perhaps provide an "unsafe" helper that uses the variant setup but that doesn't use tagging and instead tries each of the decoders in turn / encodes without further wrapping. At least then writing these would be a little more in fitting with the rest of the library's usage.

JordanMartinez commented 3 years ago

Perhaps a codec could be written for purescript-untagged-union?

garyb commented 3 years ago

I don't really want to add a dependency on that here personally, but yeah, that would be an option. It'd still suffer from the possibility of overlapping values depending on the construction of the type.

I don't think this is actually possible to express, but a safe version would have to have a rule that amounts to "none of the types involved can be coerced to each other".

azuzunaga commented 3 years ago

Thank you both for your help! @garyb, that did the trick.

By the way, I'd be happy to add a PR to the readme with your code example. I think this is a common enough scenario that it might help other people get unstuck. Let me know and either way thanks again!

JordanMartinez commented 3 years ago

And if not, I'd accept this as a PR to the cookbook repo. We already have one example of how to use this repo there.

JordanMartinez commented 3 years ago

I don't think this is actually possible to express, but a safe version would have to have a rule that amounts to "none of the types involved can be coerced to each other".

Perhaps this kind of thing would be better handled in FFI?

garyb / purescript-codec-argonaut

Decoding JSON fields with multiple value types #45