Allow hoisted+parallelized execution of a subgraph's contributed entity fields

magicmark commented 2 months ago

:wave: Consider this query:

query {
  user(id: 4) {
    name
    profilePhoto
  }
}

And our query plan looks like this:

QueryPlan {
  Sequence {
    Fetch(service: "legacy_monolith") {
      {
        user(id: 4) {
          __typename
          id
          name
        }
      }
    },
    Flatten(path: "user") {
      Fetch(service: "fancy_new_photos_subgraph") {
        { ... on User { __typename id } } =>
        {
          ... on User {
            profilePhoto
          }
        }
      },
    },
  },
}

Query.user and User.name lives in our legacy monolith
User.profilePhoto lives in a new subgraph

Pretty standard stuff.

But upon migrating out the profilePhoto resolver to the new subgraph, teams notice "hey i have an extra waterfall", and they can't start hitting the database to fetch the profile photo until legacy_monolith finishes resolving - but there's no real dependency there.

With larger types, more fields and more subgraphs, we become more sensitive and exposed to this issue.

Very related discussion here https://gist.github.com/magicmark/cbda3eedf1255334caee357fde7680de

Problem statement

Child subgraphs are blocked from resolving entity fields until the parent subgraph yields.

Proposal

In the above example, we have a very simplified example of Query.user(id: ID): User. In the initial request to the router, we already have the argument (id) that would be required to call __resolveReference in fancy_new_photos_subgraph.

This won't always be the case - e.g. Query.searchUser(name: String): User - this would require the searchUser to finish yielding before we can get an ID.

But many of our use cases, our @key(fields: ...) information is already available from the request object.

It would be cool to declare schema like this:

type User @key(fields: "id", hoistableFrom: "Query.user") {
  id: ID!
  profilePhoto: String
}

If all fields in the child subgraph query's selection set are being referenced via id with a matching hoistable argument, then the subgraph could call its __resolveReference in parallel with the query to the monolith.

meiamsome commented 2 months ago

In general the proposed solution is, I think, somewhat problematic.

For this to be a valid operation, we must have guarantees on the behaviour of the Query.user field:

Infallibility It must be impossible for the Query.user resolver to fail (Either returning null, or an error), otherwise we have made a request to the second subgraph that is not logical.

If the Query.user resolver implements any sort of permission or existence checks these would be bypassed, resulting in potential new security concerns.

Given this is a strong requirement for correctness, it would make sense to require the field be marked non-null, e.g. Query.user(id: ID): User!.
Every possible input must be valid for the type For example, say your user ids are always numeric, but someone passes in "notANumber". If we apply the optimisation, the second subgraph will be asked to execute a request with an id that doesn't exist.

Or, in your example of Query.user(id: ID): User the id is passable as null, which would instantly fail the non-null requirement of the definition of User.id.

Therefore it makes sense to add the restriction that the input fields must be equivalent or more restrictive than their definition on the type. E.g. all of these are valid:
```
type Query {
  user(id: ID): User!
}
type User @key(fields: "id", hoistableFrom: "Query.user") {
  id: ID
}
# or
type Query {
  user(id: ID!): User!
}
type User @key(fields: "id", hoistableFrom: "Query.user") {
  id: ID!
}
# or
type Query {
  user(id: ID!): User!
}
type User @key(fields: "id", hoistableFrom: "Query.user") {
  id: ID
}
```
But not
```
type Query {
  user(id: ID): User!
}
type User @key(fields: "id", hoistableFrom: "Query.user") {
  id: ID!
}
```
The Query.user resolver and the key field resolver(s) must not alter the input field values in any way. If the resolver is implemented to do any sort of checks or mapping on the input ID, this would invalidate the optimisation.

E.g., if the result of a query($id: ID!) { user(id: $id) { id } } maps an old id format (say 123 to user:123 or something), then we break the assumption of this optimisation.

Note that this is a slightly less strict requirement than the Query.user resolver not transforming the keys. For example the following set of resolvers would be valid:
```
{
  Query: {
    user: (_root, { id }) => ({ id: `user:${id}` }),
  },
  User: {
    __resolveReference: ({ id }) => ({ id: `user:${id}` }),
    id: ({ id }) => id.slice(5),
  },
}
```
Note that this does not add a new type of restriction (it is already true that a call to Query._entities should preserve the keyfields (as shown by the above __resolveReference, but it does increase the potential area of issues).

Other approaches

The federation spec actually allows both subgraphs to declare the Query.user field as @shareable, which means the query planner can parallelise the calls to them. This itself has all the issues of any other usage of @shareable, namely:

The implementations must behave identically in each subgraph. If they do not it is impossible to make a valid combined response for the client (if one subgraph returns user 1, and the other user 2, for example).

Note that this can be subject to timing issues, for example, if the Query.user makes a database hit then it's possible for the two servers to hit the DB and get inconsistent read results.

This results in a query plan like:

QueryPlan {
  Parallel {
    Fetch(service: "monolith") {
      {
        user(id: $userId) {
          name
          id
        }
      }
    },
    Fetch(service: "new_subgraph") {
      {
        user(id: $userId) {
          profilePhoto
        }
      }
    },
  },
}

I think in general it would be better to optimise the general case, as I suggest here, which should solve for your use case too: https://github.com/apollographql/federation/issues/2653#issuecomment-1712145052

Stopgap solution

If you don't want to duplicate the definition of Query.user you could declare your monolith server to be two separate subgraphs as follows. This is a current solution that doesn't require changes to the federation implementation or to duplicate the field, but is a bit of a pain to manage.

type Query {
  user(id: ID): User
}

type User @key(fields: "id") {
  id: ID!
}

And the rest of your schema omitting Query.user, e.g.:


type Query {
  someOtherField: Boolean!
}

type User @key(fields: "id") {
  id: ID!
  name: String!
}

This will force the federation plan to always hop services after the call to Query.user, incurring a small network round-trip cost for anything that stays inside the monolith, but no longer water-falling for the stuff in other subgraphs.

I have created a branch on my federation defer POC repository to show it is effective for your use case, and also to show that Apollo Router can utilise a @shareable field to parallelise queries, as well as the alternative stopgap solution above works: https://github.com/meiamsome/federation-defer-poc/tree/fed-issue-3141?tab=readme-ov-file

magicmark commented 2 months ago

Thanks for the response @meiamsome!

I really like the stopgap solution, nice and sneaky :)

As for the class of concerns raised with likely any API here, there would indeed have to be some element of "trust me 😎" - this is unavoidable given the distributed nature. Prior art: the "trust me" approach is already the case for @shareable and friends:

If multiple subgraphs can resolve a field, make sure each subgraph's resolver for that field behaves identically. Otherwise, queries might return inconsistent results depending on which subgraph resolves the field. https://www.apollographql.com/docs/federation/federated-schemas/sharing-types/#sharing-object-types

I agree with your concerns, and I don't think there's any magic way around it - it would be part of the tradeoff when implementing this.

apollographql / federation