Make query diffing pluggable

jbaxleyiii commented 8 years ago

Open discussion about if we should do query diffing.

One minor pro is it means we could remove graphql-js from the client build which would really lower our bundle size. https://github.com/apollostack/apollo-client/pull/153/commits/7a8dbec8d050bf095d6cff6606f289efc58c4d85 had to increase our size due to more exports from the graphql-js library.

stubailo commented 8 years ago

OK, let's think about the pros and cons.

To start, let's define query diffing:

When the client wants to call a query, if query diffing is turned on for that query, we run the query on the current store and record all of the missing nodes. Then using that information we can compose a smaller query that fetches only the missing data.

Advantages

It's an easy way to reduce the amount of data being loaded with little effort on the developer's part
It's a convenient place to put pagination logic. If we didn't do diffing, we would have to find a place to put logic that determines how to load new pages of data without re-loading all of the previous data
If an app has a lot of static data, query diffing allows us to pre-load a lot of data and not refetch it since it doesn't change frequently. It's a great way to enable certain kinds of offline functionality.

Disadvantages

1. Inconsistent data

The better the diffing algorithm, the more likely you are to end up with inconsistent data. For example, if a person has a profile photo thumbnail and a full size photo, they could change their photo in between two fetches and end up with part of the UI that displays a different photo:

// First, on the news feed page
{
  user(id: 3) {
    photoThumb
  }
}

// User updates their photo somewhere else

// Second, on the profile page
{
  user(id: 3) {
    photoThumb
    photoFull
  }
}

Now, the second query will return an old thumbnail and a new full photo.

2. Queries are dynamic, making certain kinds of caching and analysis harder

If you always check queries against the store before actually fetching them, then the queries actually sent to the server depend on the entire state of the app at the time the query is fetched. This means you could have un-tested corner cases in your data fetching that arise from a certain combination of UI components being loaded in a specific order. Once the number of queries and UI components in your app grows quite large the number of unique queries could expand astronomically.

It might be nice, for example, to know - if I have these 8 components on my page, this is the set of queries that will run against my server. On the other hand, if all of these queries are fetching duplicate data that's not optimal from a network activity perspective. Perhaps this can be solved in a different way.

For better performance, you also want to pre-compile queries and memoize them into some sort of hash. That way, you don't need to send big queries over the network, and you also don't need to let the client execute arbitrary queries - this saves you from people DoSing your server by typing really long query strings against your production app.

3. Overhead of manipulating and printing queries at runtime

It's not clear yet, but if you have many queries being diffed all the time, that could incur some CPU cost, and would be another thing to optimize.

The more important part is that we have to load a fully-compliant GraphQL printer on the client.

Questions to answer if we turn off diffing

I think if we stopped diffing queries that would make the whole architecture a lot simpler, but at the expense of some additional effort on the app developer's part. Let's see what that would entail:

How to do pagination?

We had a design from before about how diffing plugins could make it easy to do pagination and pagination-like things at runtime. Those wouldn't work at all if we didn't do runtime query diffing. There are a few things we could do:

Idea: fetchNextPage query:

const query = `
  user {
    firstName
    lastName
    posts(page: 1, numPages: 4) {
      pages {
        pageNumber
        posts {
          title
        }
      }
    }
  }
`

// Generated automatically
const fetchPages = `
  user {
    posts(page: $page) {
      pages {
        pageNumber
        posts {
          title
        }
      }
    }
  }
`

// Now we just need to incorporate into the store properly
// so we still need a page normalization plugin that deals with
// the special paginated list type

Essentially, given a query that contains one or more possibly paginated lists we should be able to statically generate all of the queries that would be needed to fetch new pages. That means we never need to generate new queries at runtime. So we still do something like query diffing, but there is a limited set of queries that could be the result of the diff.

How to query from the store?

It should still be possible to run a query against the client-side store and get the data we have at the time. So we should be able to get partial results from a precompiled query without generating a multitude of new queries. This will give us the ability to work with pre-loaded data to a certain extent.

How to incrementally fetch more data?

What if we have a huge catalog which is precompiled into our app bundle, and we just want to fetch a few more fields for a particular item? I think this could be a manual optimization. So you write a special query that just fetches those items, integrate it into the store, and then use the store querying mechanism directly to get the new data. So it goes back to a two-step process of loading data and then reading it from the cache, but that's probably what you want anyway if you care that much about minimizing loading.

What about automatically reducing data loading?

In this world, we wouldn't be able to do too much for the developer about transparently reducing data fetching in their app. Perhaps an alternative could be to instead print warnings in development about possible optimizations. That might even be better because the developer is more in control of their data fetching logic....

stubailo commented 8 years ago

@abhiaiyer91 @martijnwalraven @helfer a lot of stuff in here - this is a really big change so I'm curious what you think about the tradeoffs. I suspect I'm biased in favor of not diffing, so I didn't give as much thought to what we lose.

helfer commented 8 years ago

Here are some of my thoughts.

Additional advantages of query diffing:

Data requirements can be fully declarative, meaning that at every moment, the GraphQL query represents the full data requirements of the app (or at least the component).

It's much much easier to just write what data the component needs in each of its states rather than figuring out the queries that you need to run in order to get the delta of that data. You could of course still do that and just fetch the whole thing. That means the real queston we should be asking ourselves is: How much do developers care about optimizing this?

If the answer is different for different people, then perhaps we could provide that functionality with a plugin/extension so people can load it when they want? My guess is that keeping things as modular as possible and having hooks in the right places will go a long way to making Apollo the best thing to use for a wide variety of use cases.

If not doing query diffing in Apollo means that all developers will start essentially writing their own diffed queries, then we should absolutely do it. It's much easier for us to give developers a correct and well-tested implementation of diffing than it is for anyone to figure out all the 'deltas' themselves, especially if the component can have a bunch of different states.

As for the disadvantages, I agree with all of them, but they might not be as bad as we expect now:

1. Inconsistent data

This one is a bit tricky. We probably won't have invalidations for a while, but when we do, this problem could be solved with those. Also, I guess people will have this problem in spades if they try to write their own delta/diffed queries. If that's something the developer worries about, they can always force refetch, right?

2. Dynamic queries

This is probably not as big of a problem as you might think it would be. It's much more likely that we can implement diffing correctly in Apollo client than it is for someone to correctly implement the proper queries for deltas themselves if we don't provide that functionality at all.

Given that the data requirements are declarative, I think apollo client should always be able to ensure that the right data gets fetched. It might muddle the picture on the server and make performance logging/tracking a bit harder, but I'm not sure it matters that much.

3. Diffing overhead

This, I think, is the most important thing to consider. Apart from the overhead of actually diffing stuff, there's also the overhead of implementing and maintaining all the diffing logic in our code.

After writing this, my best guess is that there is no right answer to the question of query diffing. Depending on your app it can range from absolutely essential to actually harmful. Perhaps we can punt on this issue and code in a way that we don't actually have to make a final decision here? If we can do that without making Apollo-client considerably more complicated, then I think we should.

jbaxleyiii commented 8 years ago

@stubailo @helfer

If the answer is different for different people, then perhaps we could provide that functionality with a plugin/extension so people can load it when they want? My guess is that keeping things as modular as possible and having hooks in the right places will go a long way to making Apollo the best thing to use for a wide variety of use cases

So I was thinking about building a logging plugin for apollo on the plane ride home. Something that logs the query, and the time it takes to get the result as a simple start. Currently the only form of middleware that I am aware of is around manipulating the network request. If we had the ability to apply middleware that exists between query/watchQuery/mutation => middleware => network request => result, we could abstract query diffing as part of that middleware.

query => diffing middleware => smaller query => request => result?

I'm sure there is more that needs to go into it but you could even include parsing a string as a middleware.

string query => parse string => .gql template => request => result?

This would make the library more modular (and smaller) but would require the overhead of you knowing to add them if you need them.

Based on the discussion here, I would love to abstract diffing from the core, BUT keep it as an apollo middleware so it can be used when needed.

stubailo commented 8 years ago

Yeah I think that's a really great idea. Diffing should be an optional pluggable thing you can inject.

BTW, what are the tradeoffs of "middleware" vs. something like a pluggable differ? Do we envision people running multiple diffing steps?

stubailo commented 8 years ago

After talking to Jonas a bit, it would be cool if Apollo Client was almost a framework for writing the GraphQL client that you want - whether that's diffing vs. not, runtime vs. build-time query parsing, etc - the basic architecture we have with redux lets us be pretty flexible about that.

There are only a couple of invariants to maintain - the API, the fact that you get the query results, etc - and the internals can be modified at will.

That would be a pretty awesome value prop for someone looking to use GraphQL in their client.

jbaxleyiii commented 8 years ago

@stubailo I was thinking a singular middleware would allow a library of plugins to be created. You could combine them in the same way you combine reducers in redux or middleware in express. So same API for logging, events around queries, diffing, parsing, etc.

jbaxleyiii commented 8 years ago

@stubailo @helfer

After talking to Jonas a bit, it would be cool if Apollo Client was almost a framework for writing the GraphQL client that you want

Yes this, 💯 this

I would make things like building the embeddable apollo vs the full application one so easy to do

stubailo commented 8 years ago

I was thinking, we could just have different entry points that simply inject different modules. So:

// ALL batteries included, all diffing, parsing, plugins, etc etc
import ApolloClient from 'apollo-client';

// Nothing included, just the core API
import ApolloClient from 'apollo-client/base';

// Inject stuff
import { diffAgainstStore } from 'apollo-client/middleware'
const client = new ApolloClient({
  queryMiddleware: [ diffAgainstStore ]
});

I think there is still a big benefit to shipping it all in one NPM package to avoid dependency hell, but any reasonable bundling system will be able to avoid loading stuff that isn't actually imported!

abhiaiyer91 commented 8 years ago

If the answer is different for different people, then perhaps we could provide that functionality with a plugin/extension so people can load it when they want? My guess is that keeping things as modular as possible and having hooks in the right places will go a long way to making Apollo the best thing to use for a wide variety of use cases.

This is definitely true. It is different for a wide range of developers. Making this and any other optional enhancement is probably best served as extensions over the norm.

abhiaiyer91 commented 8 years ago

@stubailo 💯 on the middleware.

stubailo commented 8 years ago

We're removing query diffing #615

apollographql / apollo-client