captbaritone commented 7 months ago

Reactive GraphQL Architecture

This document outlines a vision for using GraphQL to model client data in applications which have highly complex client state. It is informed by the constraints of developing applications for the web, but should be applicable to native applications as well.

GraphQL provides a declarative syntax for application code to specify its data dependencies. While GraphQL was designed for facilitating query/response communications between clients and servers, it has also proved a useful mechanism for implementing client-side data loading from non-GraphQL servers. Implementing your client-side data layer as a GraphQL executor enables decoupling product code from the code which fetches data from a REST server. The GraphQL resolver architecture also provides an opinionated way to model the data layer, forcing it to be implemented in composable fashion, where the GraphQL executor is responsible for composing the individual resolvers together to derive all the needed data for a product surface.

Historically, this architecture is most often encountered in products where the front-end team sees value in the developer experience of GraphQL, but organizational or technical impediments prevent implementing the GraphQL executor on the server. However, we are starting to see other types of applications where this architecture makes sense for purely technical reasons. Examples include:

Applications which deal with end-to-end encrypted data, where the data is opaque until it's on the client where it can be decrypted
"Local First" applications where a client-side database is the source of truth for all UI and communication with the server (if any) is mediated through that layer

While implementing a GraphQL executor on the client can be an attractive architecture from a developer experience perspective, it creates a number of challenges in terms of efficiency. The rest of this post will describe a proposed evolution of this architecture which preserves its benefits while mitigating many of its challenges.

The Architecture

The GraphQL schema is written in an implementation-first style which allows a compiler to infer GraphQL schema from the names and type annotations used in the implementation. See Grats for an idea of what this could look like. This also allows the compiler to understand exactly what code is needed to resolve a given query.
The GraphQL resolvers have the option to be reactive, returning streams/observables of values representing that value over time rather than simply returning the current value
The GraphQL execution is (optionally) performed off the main thread. For example using a Web Worker or even a shared web worker.
Responses, and subsequent updates, from the GraphQL executor are provided to the main thread in a normalized form, rather than as a JSON object matching the shape of the GraphQL operation. This moves potentially expensive normalization work off the main thread, while also enabling efficient patch updates.
Product code is presented with a composite schema composed of the data available from the GraphQL server, if one exists, as well as the client-defined schema. Product code should not need to think about the distinction. The compiler and client GraphQL executor are responsible for optimally fetching/computing the needed data.
The GraphQL client on the main thread (e.g. Relay) operates using a normalized cache of the same shape generated by the executor enabling efficient propagation of data updates to the UI.

While the architecture is not prescriptive about any actual tools, Relay, with its compiler and generated code, is well positioned to explore this architecture. Ideally it can eventually be decomposed into distinct tools:

Implementation-first GraphQL schema authoring tool
Reactive GraphQL executor
GraphQL client that accepts "live" normalized responses

Problems Solved

Abstracts away the fact that client data is dynamic and thus could change at any time
Makes updates from the data layer efficient (O(changed data) as opposed to O(GraphQL operation size))
Enables efficient bundling where only the subset of the GraphQL resolvers reachable from a given operation need to be included in the bundle
Abstracts away the distinction between server and client data for product code enabling a single declarative API for reading data. This enables seamlessly moving resolver implementation between client and server without needing to touch client code
Client-defined schema can be defined and implemented without needing to also manually specify GraphQL schema (SDL)

Benefits Preserved

Provides an opinionated architecture for the client data layer which enables it to be fully decoupled from application code
Provides a single, discoverable, tooling compatible schema for product code to discover what data is available
The loading/preloading/computing of all data needed to render a given surface can be initiated independently of needing to actually render the surface
Data sent between the main and worker threads can be efficiently serialized/deserialized with JSON thanks to GraphQL's well-defined serialization semantics

Open Questions

We are still early in exploring this architecture and some open questions remain:

Most likely requires a reactive data store or some abstraction on top of a non-reactive data store which allows it to appear reactive. We may also find we also find ourselves needing reactive versions of other layers that exist in GraphQL servers: ORM? Ent? DataLoader?
The tradeoffs of complexity/efficiency/memory use of the reactive GraphQL executor that returns normalized responses are not yet well understood. I suspect an efficient implementation exists here, but it's a complicated problem with many tradeoffs to explore.
Are there viable migrations strategies to incrementally adopt this architecture when coming from other existing setups?
While the compiler can build a module that imports only the resolver code needed for a specific query, it's unclear how we coordinate code loading between the main thread and worker thread. The main thread bundle knows which operation(s) its going to dispatch. Somehow that needs to also trigger the worker thread fetching the code to execute those operations, ideally in parallel with the main thread loading its code.
What should the semantics of mutations be in the context of a reactive GraphQL executor? Specifically, the response portion of the mutation is generally used to specify which updates the client would like to observe, but with a reactive executor we already expect to be notified of changes to any data we are currently observing.

Collaboration Opportunities

Our early work to infer GraphQL schema from typed JavaScript is currently limited to Flow. An external contributor could provide equivalent mappings from SWC or Oxc's AST to our Rust AST.
Exploration of incremental migrations strategies that could be viable. One approach that might be relevant is outlined here. In general this problem is complicated due to the interconnectedness of a GraphQL schema. You either need a type-safe way to tie their execution back together, or a way to partition the graph.

Sources

These ideas have been explored across various projects:

Grats: Implementation-first GraphQL servers for TypeScript
Relay Resolvers: Modeling derived state in Relay's graph
@live resolvers: Experimental feature for modeling reactive client state in Relay's graph (example app)

alloy commented 7 months ago

What should the semantics of mutations be in the context of a reactive GraphQL executor? Specifically, the response portion of the mutation is generally used to specify which updates the client would like to observe, but with a reactive executor we already expect to be notified of changes to any data we are currently observing.

In a world where we'd want to ideally eliminate all updaters, optimistic updates will need to be handled by the local data-store layer. Would this become entirely a concern of the application, or do you imagine Relay would still play a role in this?

alloy commented 7 months ago

Are there viable migrations strategies to incrementally adopt this architecture when coming from other existing setups?

This includes:

Existing graphql-js based resolvers
Existing app code that uses another GraphQL client

flow-danny commented 6 months ago

I was actually thinking about doing a little experiment implementing a Network layer fetchQuery that doesn't really fetch over HTTP, but gets data from local SQLite.

It would do this by running graphql resolvers as if it's a graphql server, avoiding all intermediate serialization.

After every commit from network updates, a crude way to make it reactive, could be to invalidate the Store?

captbaritone commented 6 months ago

In a world where we'd want to ideally eliminate all updaters, optimistic updates will need to be handled by the local data-store layer. Would this become entirely a concern of the application, or do you imagine Relay would still play a role in this?

Still an open question! Perhaps the answer is that you'll be able to do either? Some client data layers may have their own optimistic state mechanism. That would probably have the ability to be more robust. A higher ceiling.

Conversely some will not, in which case Relay's primitive which make sense for server data should still be available.

captbaritone commented 6 months ago

I was actually thinking about doing a little experiment implementing a Network layer fetchQuery that doesn't really fetch over HTTP, but gets data from local SQLite.

I did a prototype of something similar to this using Relay Resolvers which you can find here: https://relay.dev/docs/next/guides/relay-resolvers/introduction/

By using Relay Live Resolvers (experimental feature) you can invalidate values at a field or record granular level. For a crude start I just invalidated every value on every db update. But something like https://github.com/vlcn-io/cr-sqlite could probably get you something much more sophisticated.

flow-danny commented 6 months ago

With a generic entity schema it will be easy to know which Node IDs are invalid, but is there currently a way to invalidate only the active queries currently rendering those nodes?

flow-danny commented 6 months ago

I also looked at Relay resolvers, but they work so differently to normal resolvers... tied to fragments instead of the schema itself.

Skipping all the networking and JSON back and forth, a regular graphql-tools resolver will already give you subscriptions, which is basically reactive.

I'm sure its possible to stitch a server schema in there and have the client-side resolver do a regular network fetch.

The resolver could also be compiled using something like graphql-jit to reduce overhead.

captbaritone commented 6 months ago

I also looked at Relay resolvers, but they work so differently to normal resolvers... tied to fragments instead of the schema itself.

Sorry for the confusion. We've been expanding Relay Resolvers to enable them to model arbitrary arbitrary client state with field-level reactivity. I've just merged a PR which add documentation for this experimental feature. You can read more here: https://relay.dev/docs/next/guides/relay-resolvers/introduction/

Lalitha-Iyer commented 6 months ago

Abstracts away the distinction between server and client data for product code enabling a single declarative API for reading data.

Would this mean we would continue to use client state management libraries like Redux but provide a common abstraction for querying?

captbaritone commented 5 months ago

@Lalitha-Iyer

Would this mean we would continue to use client state management libraries like Redux but provide a common abstraction for querying?

Correct. You could continue to use Redux or similar for true client state or non-GraphQL data. However to get the advantages of Relay for network data (defining data dependencies inline without resulting in data fetching waterfalls during loading) you would want to ensure all network data was still coming from GraphQL running on the server.

ivosabev commented 5 months ago

I am not sure I understand the benefits of the Resolvers compared to the client schema extension, which could completely replace a local state library like Redux

captbaritone commented 5 months ago

@ivosabev When we first started looking at the problem, that's what we thought as well! But it turns out that while technically Client Schema Extensions could be used to model all local state like Redux, it's not very practical to do so once you have a meaningful amount of client state you want to model. You would need some layer which ensured that all the records/fields that your product code was expecting to read were pre-populated in the Relay store. As updates happen in your app, you'd need to write them to the Relay store, which is often not practical. For example, if you had a todo app and wanted to transfer all tasks owned by one user to another user you'd have a very trick set of tasks to do:

Locate all the Todo records owned by the user (you'd either need to do a full scan or the store to do this, or define a separate query to read this out)
Manually (without a good typesafe api) update each Todo record to have the new owner
Locate the old user's record in the store
Find their all_todos field, and empty it
1. (Note that you might also need to update other fields like overdue_todos or all_todos(completed: true) which each would have their own field in the store)
2. Locate the new user's record in the store
3. Find their all_todos field and add the new items (potentially sorting them)
Note that with the above you have ~0 type safety to help ensuring you don't miss something since Relay can't know all the places that need to be updated.

This is a lot of complexity. In practice we've found it's much nicer to have the source of truth of your data be in some sensible reactive data store (like Redux, or similar) and then define functions which model how it gets projected into the graph schema (resolvers), just like a server would do. With this approach you get one way data flow, where you update the source of truth (the owner_id property on the todo items) and all the relevant fields that are currently being read automatically get recomputed.

leoasis commented 5 months ago

What should the semantics of mutations be in the context of a reactive GraphQL executor? Specifically, the response portion of the mutation is generally used to specify which updates the client would like to observe, but with a reactive executor we already expect to be notified of changes to any data we are currently observing.

This reminds me of the original idea in Relay classic about the mutation with "fat queries", where the library would calculate the intersection between all queries and the result of the mutation. This was somewhat inefficient and also non deterministic which was prohibitive for client to server architectures since it would imply potentially ad-hoc queries (not possible to persist ahead of time), but maybe that is not a concern in a local first architecture? Another alternative would be to make the compiler smart enough to calculate the potential intersection of the result and the active queries (with some extra hints maybe), but that looks like it could explode very easily in complexity, and in the amount of artifacts it would output.

facebook / relay