Closed rhino88 closed 1 day ago
Thanks for raising this request! Our team has added the item to our backlog, but it does require a bit more design and thought before we can prioritize it. We will definitely keep this issue updated once it's been officially prioritized, and we are also open to community contributions.
from our stand point, this issue would deserve a better priority that P4 as it affects one fundamental aspect of graphql (prevent overfetching). The resolution of this issue by Apollo is very important to us at Wayfair.
We're seeing a latency impact from the query planner "popping" fields from union fragments (which should be conditional on the type being present in the actual result set) up into a sequential fetch and over fetching. The reproduction looks something like this:
# Subgraph 1
type Query {
card: ProductCard
}
type ProductCard @key(fields: "id") {
id: ID!
components: Component
}
type Product @key(fields: "newKey") {
newKey: ID!
}
union Component = ComponentA | ComponentB | ComponentC
type ComponentA {
a: String
product: Product
}
type ComponentB {
b: String
product: Product
}
type ComponentC {
c: String
product: Product
}
# Subgraph 2
type Product @key(fields: "oldKey") @key(fields: "newKey") {
oldKey: ID!
newKey: ID!
a: String!
}
# Subgraph 3
type Product @key(fields: "oldKey") {
oldKey: ID!
b: String!
}
# Subgraph 4
type Product @key(fields: "oldKey") {
oldKey: ID!
c: String!
}
# Query
query Foo {
card {
components {
... on ComponentA {
__typename
product {
__typename
a
}
}
... on ComponentB {
__typename
product {
__typename
b
}
}
... on ComponentC {
__typename
product {
__typename
c
}
}
}
}
}
The expectation is that Router would:
newKey
to oldKey
a
, b
, and c
fields based on the type returned in blocks
Here's the expected query plan (or something similar to this is expected):
QueryPlan {
Sequence {
Fetch(service: "subgraph-1") {
{
card {
components {
__typename
... on ComponentA {
__typename
product {
__typename
newKey
}
}
... on ComponentB {
__typename
product {
__typename
newKey
}
}
... on ComponentC {
__typename
product {
__typename
newKey
}
}
}
}
}
},
Flatten(path: "card.components.product") {
Fetch(service: "subgraph-2") {
{
... on Product {
__typename
newKey
}
} =>
{
... on Product {
__typename
oldKey
}
}
},
},
Parallel {
Flatten(path: "card.components.product") {
Fetch(service: "subgraph-2") {
{
... on Product {
__typename
newKey
}
} =>
{
... on Product {
__typename
a
}
}
},
},
Flatten(path: "card.components.product") {
Fetch(service: "subgraph-3") {
{
... on Product {
__typename
oldKey
}
} =>
{
... on Product {
__typename
b
}
}
},
},
Flatten(path: "card.components.product") {
Fetch(service: "subgraph-4") {
{
... on Product {
__typename
oldKey
}
} =>
{
... on Product {
__typename
c
}
}
},
},
},
},
}
Instead, Router does the following:
newKey
to oldKey
. It is at this point overfetching occurs as Router pops field a
out of the union and fetches it at the same time as the key conversion. In practice, this is resulting in a latency cost as field a
represents a higher latency field that, in most cases, we do not want to fetch.Here's the query plan:
QueryPlan {
Sequence {
Fetch(service: "subgraph-1") {
{
card {
components {
__typename
... on ComponentA {
__typename
product {
__typename
newKey
}
}
... on ComponentB {
__typename
product {
__typename
newKey
}
}
... on ComponentC {
__typename
product {
__typename
newKey
}
}
}
}
}
},
Flatten(path: "card.components.product") {
Fetch(service: "subgraph-2") {
{
... on Product {
__typename
newKey
}
} =>
{
... on Product {
__typename
a
oldKey
}
}
},
},
Parallel {
Flatten(path: "card.components.product") {
Fetch(service: "subgraph-3") {
{
... on Product {
__typename
oldKey
}
} =>
{
... on Product {
__typename
b
}
}
},
},
Flatten(path: "card.components.product") {
Fetch(service: "subgraph-4") {
{
... on Product {
__typename
oldKey
}
} =>
{
... on Product {
__typename
c
}
}
},
},
},
},
}
I have identified two band-aid fixes for this example. But, to be clear, the do not fix the problem. These are just two ways to try and force the query plan shape to change.
If the oldKey
in this situation is eliminated, then the query plan reflects the ideal state as there is no sequential fetch
that causes a
to pop out. It stops being always fetched. This is a challenging remedy as it requires architectural/data changes to support the new identifier.
If field a
was no longer part of Subgraph 2, but instead was moved to, say, Subgraph 5, then it would achieve the desired query plan effect. However, moving field a
to, say, subgraph 3 would result in shifting the over fetching from every request to any request where the response included Component A or Component B. So it's not really a "fix" but just assertion of a small amount of influence over the query plan.
The crux of the issue is that router is overly eager to reduce subgraph requests at the cost of overfetching. This breaks a core tenant of GraphQL which is only getting what you ask for. The returned shape of the union does not dictate that field a
is necessary, yet it is always fetched regardless. And this is only a single example of such behavior.
Ultimately, the Query planner needs to consider fragments on union members as unique units of the operation and should not pop fields out of those fragments. In some cases, yes it will result in additional requests that in some cases could have been avoided. But it's that or over fetching and, IMO, over fetching is the worse of two evils. This is until the query planner is able to include if/else conditions in it's static flow (which is another conversation for another day).
This is why I support elevating this higher than a P4 issue. This really does feel more like a P1 or P2 at the lowest because it will result in UI, UX, and very difficult to determine, quantify, and communicate schema design constraints. I'm already having conversations with teams about what features they need to cut, or what subgraphs need to be broken apart to try and influence the query plan to avoid this overfetching. That's not a concern that should be leaking into the domain separation.
FWIW, I haven't confirmed this but I imagine the problem could surface for interface fragments as well. That should probably also be considered when solving this problem.
The experimental_type_conditioned_fetching
flag should fix this in router 1.46+
👋 looks like we forgot to close this issue with the #2949 PR.
This should be available since fed v2.7.3 and supported by router version 1.46+. Thanks for the reminder!
Issue Description
When including conditional fragments on a union field the resolvers for fragments appear to be requested unconditionally from the router.
In the example below, the resolver for
Product#c
is invoked when the conditional fragment for... on ComponentC
is not needed given a response without any instances ofComponentC
from thecard
service. Removing the conditional fragment for... on ComponentC
prevents theProduct#c
value from being requested, as expected.Our production
router
is currently usingv1.20.0
but we also confirmed that this issue exists onv1.28.0
. This also impacts thegateway
as seen in the reproduction link.Subgraph A Schema(Card)
``` extend schema @link( url: "https://specs.apollo.dev/federation/v2.3" import: ["@key"] ) type Query { card: ProductCard } type ProductCard @key(fields: "id") { id: ID! components: Component } type Product @key(fields: "id") { id: ID! } union Component = ComponentA | ComponentB | ComponentC type ComponentA { a: String product: Product } type ComponentB { b: String product: Product } type ComponentC { c: String product: Product } ```Subgraph B Schema (Product)
``` extend schema @link( url: "https://specs.apollo.dev/federation/v2.3" import: ["@key", "@interfaceObject"] ) type Product @key(fields: "id") { id: ID! a: String! b: String! c: String! } ```Query
``` query Foo { card { components { __typename ... on ComponentA { product { id a } } ... on ComponentB { product { id b } } ... on ComponentC { product { id c } } } } } ```Query Plan
``` QueryPlan { Sequence { Fetch(service: "card") { { card { components { __typename ... on ComponentA { product { __typename id } } ... on ComponentB { product { __typename id } } ... on ComponentC { product { __typename id } } } } } }, Flatten(path: "card.components.product") { Fetch(service: "product") { { ... on Product { __typename id } } => { ... on Product { a b c } } }, }, }, } ```Response
``` { "data": { "card": { "components": { "__typename": "ComponentA", "product": { "id": "product1", "a": "A1" } } } } } ```Link to Reproduction
https://codesandbox.io/p/sandbox/eloquent-christian-kcs989
Reproduction Steps
Run the query above, and notice the
C resolver -- BAD
andB resolver -- BAD
in the terminal output.