Allow selection sets to recover gracefully when decoding fails.

At Blissfully we ship changes to our GQL layer on the backend first, and our frontend always builds it's API based on what is currently in prod before shipping.

This is generally pretty good! But one of the challenges is when the backend starts serving new values before a client is ready for the new data.

This most commonly shows up when we're updating enums or unions and adding a new variant.

In these cases, we want to be able to construct a SelectionSet on the frontend which is more tolerant of decode failures by providing some default value that can be returned instead.

As we needed to ship, we put together a brief fork : https://github.com/Blissfully/elm-graphql/tree/graphql-with-recovery

We implemented this tweak to the SelectionSet file:

recover : 
    { onFailure : value
    , onSuccess : (decodesTo -> value)
    }
    -> SelectionSet decodesTo scope 
    -> SelectionSet value scope

Hey, thanks for opening the issue!

Some context to try to summarize our discussion on Slack:

There are a few places where non-breaking API changes can result in "base" generated elm-graphql SelectionSet's failing.

Interfaces/Unions

Here's an example of the generated code for a union:

https://github.com/dillonkearns/elm-graphql/blob/02c59a90ba497d9e2e88aeb181bc941239b23214/examples/src/Swapi/Union/CharacterUnion.elm#L35-L38

The underlying decoder will fail here if there is a non-breaking change that adds a new variant to the union:

https://github.com/dillonkearns/elm-graphql/blob/02c59a90ba497d9e2e88aeb181bc941239b23214/src/Graphql/Internal/Builder/Object.elm#L90-L110

This is the case whether or not it is done as an exhaustive check, or extends the maybeFragments.

Enums

A non-breaking API change that adds an Enum will cause a base generated SelectionSet to fail. Here's an example:

https://github.com/dillonkearns/elm-graphql/blob/02c59a90ba497d9e2e88aeb181bc941239b23214/examples/src/Swapi/Enum/Phrase.elm#L40-L75

Goals

I like the idea of making it very explicit where generated elm-graphql SelectionSet's may fail. A scalar may fail, because essentially that is untyped data currently as far as the GraphQL schema is concerned (the underlying JSON representation is unknown by the schema). The Custom Scalar Codecs functionality allows customizing this with arbitrary JSON decoders, which I think is an accurate representation of what the GraphQL schema gives you (it's JSON, but nothing else is guaranteed).

The idea of a recover helper is a good short-term fix, but long-term I don't think think it would be the right abstraction for what I'm looking for in elm-graphql. The reason is that it allows you to recover failure anywhere, even in places where it may not be possible to recover from a failure under the assumption you're working with non-breaking GraphQL schema changes.

I don't want users to get the impression that the generated SelectionSet's are something that may unexpectedly fail (again, under the assumption of a server that is enforcing the GraphQL spec, and an API that isn't using breaking API changes). While APIs may sometimes have breaking API changes, there are ways to ensure that you keep the frontend and backend in sync when this happens, like requiring the user to refresh the page, versioning APIs, etc.

I want to prevent being able to introduce decoder errors at one level, recover them at another level, and back-and-forth. I prefer to have a linear flow, otherwise it becomes a more confusing mental model where it starts to feel like exception handling control flow where you can throw/try/catch at different levels and it becomes harder to trace.

Ideas

Just to get some ideas flowing, one API design idea that I think would meet my goal of being explicit about where the generated SelectionSets could fail when there is a non-breaking API change is this:

Let's call it BreakableSelectionSet for now (maybe we can think of a better name).

-- module BreakableSelectionSet

toMaybeSelection : BreakableSelectionSet decodesTo scope -> SelectionSet (Maybe decodesTo) scope

strictForwardIncompatibleSelection : BreakableSelectionSet decodesTo scope -> SelectionSet decodesTo scope

Then we build up a BreakableSelectionSet for Enums, Unions, and Interfaces. And users can opt in to explicitly providing a default value, or allowing it to fail if the API is out of sync (like the current behavior).

Breaking API change

This would introduce a breaking API change. One way to approach this would be to have a flag to turn this on in the code generation or keep it in the previous version. Or we could start by publishing this under a beta flag in the NPM package, gather some feedback, and decide whether to make it optional or introduce a breaking change from there.

Design

What do you think of the design? This is a starting point for the design discussion, but I'm open to other ideas as here, too.

Another situation of note that I just ran in to.

permissions : SelectionSet (List GQL.Enum.Permission.Permission) GQL.Object.Person

Ideally I would be able to actually get a list of permissions here, but without the specific variant that failed to decode.

With recover or even toMaybeSelection the best case scenario is that no elements are returned if one fails to decode.

🤔

Hmm, going to have to think about Breakable. I'm assuming it wouldn't mirror the API from SelectionSet, you'd just have to convert it to a SelectionSet to compose it with existing selection sets.

It does raise the bar in terms of what people have to learn to get going.

This discussion is super valuable. Let me chime in 😎

Probably you two have already investigated around pretty well, so it is just adding an additional reference. According to https://github.com/graphql/graphql-js/issues/968#issuecomment-317834310 ,

... When building clients, it's best to be defensive against possible future expansions and handle those cases. For enums, we typically include an else or switch default clause when branching on them to handle cases the client doesn't know about. If your clients aren't programming defensively like this, then it's true that expanding the possible response values could cause issues for those clients - it's not an entirely safe change.

That means, as we currently do in elm-graphql, hard-matching against current set of enum is "not an entierly safe" approach.

My idea off the head is exactly doing this "defensive" approach: provide "else" variant. For example, if we have:

enum LegalPerson {
  Natural
  Juridical
}

Currently in generated elm-graphql code we get:

type LegalPerson
    = Natural
    | Juridical

This crashes if server starts to provide a new variant (let's say Robotic.) So we must be "defensive" against that. How about elm-graphql always providing "else" variant like this:

type LegalPerson
    = Natural
    | Juridical
    | UnknownLegalPerson String

and decode like so:

decoder : Decoder LegalPerson
decoder =
    Decode.string
        |> Decode.andThen
            (\string ->
                case string of
                    "Natural" ->
                        Decode.succeed Natural

                    "Juridical" ->
                        Decode.succeed Juridical

                    unknown ->
                        -- Decode.fail ("Invalid LegalPerson type, " ++ string ++ " try re-running the @dillonkearns/elm-graphql CLI ")
                        Decode.succeed (UnknownLegalPerson unknown)
            )

so that developers can preemptively support enum variant additions. This is a bit surprising for developers, so explanations must be in order. But it is definitely defensive, albeit clumsy.

I'd be hesitant to introduce a new variant like that because it would encourage people to handle those unknowns all the way in their UI.

Ideally we'd want those cases to be resolved at the api layer.

For the permission example I gave, the ideal situation is that when a value fails to decode, that we actually just skip it.

However if we modify the enum type to have an implicit Unknown variant, then I have no way to refine something and say "everything here is good". Like if I do List.filterMap (notUnknownLegalPerson), I still have to deal with UnknownLegalPerson in my case statements everywhere I interact with that type.

I do simpathize for that. And it clicked in me why you wanted "recover" approach on this. Keeping exhaustive enum (in Elm custom type) variant is certainly consice, prevents fallback branches seeping into every corners. "Recover" approach essentially looks like withDefault or unwrap for enums to me. It coerces type-conformance to known variant which is reasonable first step.

Another situation of note that I just ran in to.
permissions : SelectionSet (List GQL.Enum.Permission.Permission) GQL.Object.Person
Ideally I would be able to actually get a list of permissions here, but without the specific variant that failed to decode.

You could also have

-- module BreakableSelectionSet

skipUnknown : BreakableSelectionSet (List decodesTo) scope -> SelectionSet (List decodesTo) scope

which would filter those out.

I do wonder if there is a better way to introduce this without it breaking the API.

So for instance, if you could have

SelectionSet.defaultOnForwardIncompatible : a -> SelectionSet a scope -> SelectionSet a scope

-- or

SelectionSet.anticipateForwardIncompatible : SelectionSet a scope -> SelectionSet (Maybe a) scope

-- or

SelectionSet.handleForwardIncompatible : Json.Decode.Decoder a -> SelectionSet a scope -> SelectionSet a scope

Just to chime in a bit 👋

At Humio, our main concern on this subject is new enum values being sent from the server to an existing client, and it would be nice for us to be able to just ignore those new values, ala what has also been requested previously in here.

Design wise, I think there is a different route that this could also go than what has been mentioned before: generate alternative selection sets for the (possibly) affected types. I'm not sure if and how it might cover all applicable cases, so bear with me, but basically, if a field exists in the schema ala:

type SomeObject {
  listOfValues: [EnumValue!]!
}

then the generated code could produce functions like:

-- What we currently get
listOfValues : SelectionSet (List EnumValue) SomeObject

-- Function which could also be added by the generator
listOfValues__IgnoreUnknownValues : SelectionSet (List EnumValue) SomeObject

It's a bit crude, but it could be introduced without breaking the API, and I think it would be fairly easy to understand for users, since it doesn't require any new concepts. Imagining that I'm writing my selection set in an editor with suggestions, it would also mean that I am presented with this function when I try to actually retrieve the data, which gives me a nice reminder to consider if this is applicable.

The downside is that it may start to feel as though the server might just send anything it wants, rather than follow the schema, so I would be fine with this being something that you opt into in the generator. It could be a global flag, to generate functions like this for all applicable types, but it could also be a more specific configuration, where you point out specific fields or types that you would like this generated on. This way, you can specify that you know which parts of your schema might cause issues, and it won't give an impression that e.g. selection sets are generally prone to failure.

Additionally, if you already know which parts of your schema can change and cause issues, it might be nice to generate only the safe selection set, such that you can't accidentally use the one that will fail on a change.

EDIT: On reflection, we would probably want all fields which return a list of enum values to be safe in our project, always. So having the generator hit all such fields with a single flag would be the safest option I think.

EDIT: On reflection, we would probably want all fields which return a list of enum values to be safe in our project, always. So having the generator hit all such fields with a single flag would be the safest option I think.

Yeah, I agree that having a future-schema-safe selection and a non-future-safe selection for enums side-by-side would make it easy to accidentally reach for the unsafe version, and a company probably wants to use one or the other.

The nice thing about the safe version is that it's very easy to opt out of the safety. If it's a Maybe, you can do SelectionSet.nonNullOrFail.

I came up with an idea that might make it more explicit like I outlined in my earlier comments about wanting to make it clear which parts of the API might not handle schema changes. What if there was a type similar to Maybe, but named to make these semantics clear. Then you could opt out of those changes and instead of using nonNullOrFail, where it would be unclear what the null represents, it could be something like this:

type Future currentSchemaValue
    = FutureSchemaValue
    | Current currentSchemaValue

status : SelectionSet (Future Api.Enum.Status.Status) Api.Object.User

exampleSelection : SelectionSet Api.Enum.Status.Status Api.Object.User
exampleSelection =
    Api.Object.User.status
    |> SelectionSet.ignoreFutureSchema

The naming there of Future, FutureSchemaValue, ignoreFutureSchema is a bit clunky, I'm mostly just trying to get some thoughts flowing to spur ideas for more intuitive and manageable names. But it seems like it could be a good direction.

I haven't thought through whether this design would work well with other possible cases where non-breaking schema changes can cause unhandled values. I believe interfaces and unions are the only other cases besides enums (correct me if I'm wrong). I suspect that the Future type could work with those constructs as well.

I could imagine this design being refined a bit and then released under a CLI flag (in a non-breaking release). I could even imagine it potentially becoming the default if it seems to work well after that, because though it would require some work for users to upgrade their codebases, you could mechanically use SelectionSet.ignoreFutureSchema anywhere the compiler gives a new error to get the codebase back to its previous behavior. And having this safe and explicit approach as the default seems like it's aligned with the Elm philosophy and elm-graphql, so something to consider.

We've tried @dillonkearns' approach at Humio (where @KasMA1990 and I work). We basically implemented your last idea in a SafeSelectionSet module like this:

module SafeSelectionSet exposing ...

type Potential currentSchemaValue
    = Unknown
    | Known currentSchemaValue

{-| Wrap a SelectionSet's data in a Potential to make sure that we handle the case where it's not decodable.
-}
wrap : SelectionSet a scope -> SelectionSet (Potential a) scope

{-| Wrap the elements in a list in Potential.
-}
listOfPotentialEnums : Decode.Decoder a -> Decode.Decoder (List (Potential a))

Using elm-review we forbid the usage of all of the selection sets generated by elm-graphql that deal with enums, and we copied them over to a new module where we use wrap and listOfPotentialEnums like below

uiTheme : SelectionSet (SSS.Potential Humio.Api.Enum.UiTheme.UiTheme) Humio.Api.Object.UserSettings
uiTheme =
    Object.selectionForField "Enum.UiTheme.UiTheme"
        "uiTheme"
        []
        Humio.Api.Enum.UiTheme.decoder
        |> SSS.wrap

enabledFeatures : SelectionSet (List (SSS.Potential Humio.Api.Enum.FeatureFlag.FeatureFlag)) RootQuery
enabledFeatures =
    Object.selectionForField "(List Enum.FeatureFlag.FeatureFlag)"
        "enabledFeatures"
        []
        (SSS.listOfPotentialEnums Humio.Api.Enum.FeatureFlag.decoder)

In retrospect, we could have altered the generated code, which would have been a bit more safe (but also requires us to handle everything right now, whereas elm-review allows suppressing the problems momentarily).

And we then have a few functions to handle the possibility of failure:

-- Use a default value if unknown
withDefault : a -> SelectionSet (Potential a) scope -> SelectionSet a scope

-- Filter out the unknowns from a list of Potential
ignoreUnknowns : SelectionSet (List (Potential a)) scope -> SelectionSet (List a) scope

We also have an applicative for when we need to bubble up the problem, so that the constructed thing is a Unknown when one of the sub-selection-sets is Unknown. Using the vanilla SS.succeed/SS.with/SS.mapN this required the aggregating function to handle the handling of the unknowns, which felt cumbersome. This felt much nicer.

succeed : a -> SelectionSet (Potential a) scope
with : SelectionSet a scope -> SelectionSet (Potential (a -> b)) scope -> SelectionSet (Potential b) scope
withPotential : SelectionSet (Potential a) scope -> SelectionSet (Potential (a -> b)) scope -> SelectionSet (Potential b) scope

--

SSS.succeed Constructor
    |> SSS.with Something.safe
    |> SSS.withPotential Something.potential

We haven't looked at unions and interfaces, but it seems to work quite well for enums.

dillonkearns / elm-graphql