Open magicmark opened 2 months ago
In general the proposed solution is, I think, somewhat problematic.
For this to be a valid operation, we must have guarantees on the behaviour of the Query.user
field:
Infallibility
It must be impossible for the Query.user
resolver to fail (Either returning null
, or an error), otherwise we have made a request to the second subgraph that is not logical.
If the Query.user
resolver implements any sort of permission or existence checks these would be bypassed, resulting in potential new security concerns.
Given this is a strong requirement for correctness, it would make sense to require the field be marked non-null, e.g. Query.user(id: ID): User!
.
Every possible input must be valid for the type For example, say your user ids are always numeric, but someone passes in "notANumber". If we apply the optimisation, the second subgraph will be asked to execute a request with an id that doesn't exist.
Or, in your example of Query.user(id: ID): User
the id is passable as null
, which would instantly fail the non-null requirement of the definition of User.id
.
Therefore it makes sense to add the restriction that the input fields must be equivalent or more restrictive than their definition on the type. E.g. all of these are valid:
type Query {
user(id: ID): User!
}
type User @key(fields: "id", hoistableFrom: "Query.user") {
id: ID
}
# or
type Query {
user(id: ID!): User!
}
type User @key(fields: "id", hoistableFrom: "Query.user") {
id: ID!
}
# or
type Query {
user(id: ID!): User!
}
type User @key(fields: "id", hoistableFrom: "Query.user") {
id: ID
}
But not
type Query {
user(id: ID): User!
}
type User @key(fields: "id", hoistableFrom: "Query.user") {
id: ID!
}
The Query.user
resolver and the key field resolver(s) must not alter the input field values in any way.
If the resolver is implemented to do any sort of checks or mapping on the input ID, this would invalidate the optimisation.
E.g., if the result of a query($id: ID!) { user(id: $id) { id } }
maps an old id format (say 123
to user:123
or something), then we break the assumption of this optimisation.
Note that this is a slightly less strict requirement than the Query.user
resolver not transforming the keys. For example the following set of resolvers would be valid:
{
Query: {
user: (_root, { id }) => ({ id: `user:${id}` }),
},
User: {
__resolveReference: ({ id }) => ({ id: `user:${id}` }),
id: ({ id }) => id.slice(5),
},
}
Note that this does not add a new type of restriction (it is already true that a call to Query._entities
should preserve the keyfields (as shown by the above __resolveReference
, but it does increase the potential area of issues).
The federation spec actually allows both subgraphs to declare the Query.user
field as @shareable
, which means the query planner can parallelise the calls to them. This itself has all the issues of any other usage of @shareable
, namely:
The implementations must behave identically in each subgraph. If they do not it is impossible to make a valid combined response for the client (if one subgraph returns user 1, and the other user 2, for example).
Note that this can be subject to timing issues, for example, if the Query.user
makes a database hit then it's possible for the two servers to hit the DB and get inconsistent read results.
This results in a query plan like:
QueryPlan {
Parallel {
Fetch(service: "monolith") {
{
user(id: $userId) {
name
id
}
}
},
Fetch(service: "new_subgraph") {
{
user(id: $userId) {
profilePhoto
}
}
},
},
}
I think in general it would be better to optimise the general case, as I suggest here, which should solve for your use case too: https://github.com/apollographql/federation/issues/2653#issuecomment-1712145052
If you don't want to duplicate the definition of Query.user
you could declare your monolith server to be two separate subgraphs as follows. This is a current solution that doesn't require changes to the federation implementation or to duplicate the field, but is a bit of a pain to manage.
type Query {
user(id: ID): User
}
type User @key(fields: "id") {
id: ID!
}
And the rest of your schema omitting Query.user
, e.g.:
type Query {
someOtherField: Boolean!
}
type User @key(fields: "id") {
id: ID!
name: String!
}
This will force the federation plan to always hop services after the call to Query.user
, incurring a small network round-trip cost for anything that stays inside the monolith, but no longer water-falling for the stuff in other subgraphs.
I have created a branch on my federation defer POC repository to show it is effective for your use case, and also to show that Apollo Router can utilise a @shareable
field to parallelise queries, as well as the alternative stopgap solution above works: https://github.com/meiamsome/federation-defer-poc/tree/fed-issue-3141?tab=readme-ov-file
Thanks for the response @meiamsome!
I really like the stopgap solution, nice and sneaky :)
As for the class of concerns raised with likely any API here, there would indeed have to be some element of "trust me 😎" - this is unavoidable given the distributed nature. Prior art: the "trust me" approach is already the case for @shareable
and friends:
If multiple subgraphs can resolve a field, make sure each subgraph's resolver for that field behaves identically. Otherwise, queries might return inconsistent results depending on which subgraph resolves the field. https://www.apollographql.com/docs/federation/federated-schemas/sharing-types/#sharing-object-types
I agree with your concerns, and I don't think there's any magic way around it - it would be part of the tradeoff when implementing this.
:wave: Consider this query:
And our query plan looks like this:
Query.user
andUser.name
lives in our legacy monolithUser.profilePhoto
lives in a new subgraphPretty standard stuff.
But upon migrating out the
profilePhoto
resolver to the new subgraph, teams notice "hey i have an extra waterfall", and they can't start hitting the database to fetch the profile photo untillegacy_monolith
finishes resolving - but there's no real dependency there.With larger types, more fields and more subgraphs, we become more sensitive and exposed to this issue.
Very related discussion here https://gist.github.com/magicmark/cbda3eedf1255334caee357fde7680de
Problem statement
Child subgraphs are blocked from resolving entity fields until the parent subgraph yields.
Proposal
In the above example, we have a very simplified example of
Query.user(id: ID): User
. In the initial request to the router, we already have the argument (id
) that would be required to call__resolveReference
infancy_new_photos_subgraph
.This won't always be the case - e.g.
Query.searchUser(name: String): User
- this would require thesearchUser
to finish yielding before we can get an ID.But many of our use cases, our
@key(fields: ...)
information is already available from the request object.It would be cool to declare schema like this:
If all fields in the child subgraph query's selection set are being referenced via
id
with a matchinghoistable
argument, then the subgraph could call its__resolveReference
in parallel with the query to the monolith.