apollographql / federation

🌐  Build and scale a single data graph across multiple services with Apollo's federation gateway.
https://apollographql.com/docs/federation/
Other
667 stars 254 forks source link

Returning null from reference resolver results in "non-nullable field" error for nullable object #2374

Closed jfairley closed 1 year ago

jfairley commented 1 year ago

Starting with @apollo/gateway: 2.3.0, returning null from a __resolveReference adds an error to the graphql response even when the type is declared to be nullable in the relevant subgraph.

I've included a query and the error response.

Further down, I've included snippets of source code. Notably, notice that the field resolver for "boundary" declares it to be nullable.

click to see query and error ```gql query SearchGeoMetrics( ... ) { searchGeoMetrics( ... ) { value { boundary { featureId layer } } } } ``` ```json { "message": "Cannot return null for non-nullable field MapboxBoundary.featureId.", "path": [ "searchGeoMetrics", 53, "value", "boundary", "featureId" ], "extensions": { "code": "INVALID_GRAPHQL", "exception": { "stacktrace": [ "GraphQLError: Cannot return null for non-nullable field MapboxBoundary.featureId.", " at Object.err (/usr/src/app/node_modules/@apollo/federation-internals/dist/error.js:11:32)", " at updateOutputValue (/usr/src/app/node_modules/@apollo/gateway/dist/resultShaping.js:159:86)", " at applySelectionSet (/usr/src/app/node_modules/@apollo/gateway/dist/resultShaping.js:99:44)", " at applySelectionSet (/usr/src/app/node_modules/@apollo/gateway/dist/resultShaping.js:119:29)", " at updateOutputValue (/usr/src/app/node_modules/@apollo/gateway/dist/resultShaping.js:209:21)", " at applySelectionSet (/usr/src/app/node_modules/@apollo/gateway/dist/resultShaping.js:99:44)", " at applySelectionSet (/usr/src/app/node_modules/@apollo/gateway/dist/resultShaping.js:119:29)", " at updateOutputValue (/usr/src/app/node_modules/@apollo/gateway/dist/resultShaping.js:209:21)", " at updateOutputValue (/usr/src/app/node_modules/@apollo/gateway/dist/resultShaping.js:148:40)", " at applySelectionSet (/usr/src/app/node_modules/@apollo/gateway/dist/resultShaping.js:99:44)", " at updateOutputValue (/usr/src/app/node_modules/@apollo/gateway/dist/resultShaping.js:209:21)", " at updateOutputValue (/usr/src/app/node_modules/@apollo/gateway/dist/resultShaping.js:148:40)", " at /usr/src/app/node_modules/@apollo/gateway/dist/resultShaping.js:177:25", " at Array.map ()", " at updateOutputValue (/usr/src/app/node_modules/@apollo/gateway/dist/resultShaping.js:175:41)", " at updateOutputValue (/usr/src/app/node_modules/@apollo/gateway/dist/resultShaping.js:148:40)" ] } } }, ```

Expected vs Actual

Expected Behavior

Graph shows null in response with no "errors".

Works this way with "@apollo/gateway": "2.2.2".

Actual Behavior

Graph shows null in response with "errors".

Works this way with "@apollo/gateway": "2.3.0".

Sample Code

My code examples are based on code-first federation using NestJS, but the issue clearly appears upon an upgrade to @apollo/gateway 2.3.0.

click to expand ## subgraph 1 (using `"@apollo/subgraph": "2.3.0"` ### MapboxBoundaryType (resolvable) ```typescript @ObjectType("MapboxBoundary") @Directive('@key(fields: "city region country zipcode")') @Directive('@key(fields: "featureId")') export class MapboxBoundaryType extends MapboxBoundaryDTO { @Field() name: string; @Field() fullyQualifiedName: string; @Field((type) => MapboxBoundaryLayerEnum) layer: MapboxBoundaryDTO.LayerEnum; @Field() layerLabel: string; @Field() type: string; @Field() level: number; @Field() polyTilesetName: string; @Field() polyLayerName: string; @Field() featureId: string; @Field({ nullable: true }) zipcode?: string; // Below fields exist solely for ApolloFederation @Field((type) => String, { nullable: true, deprecationReason: "field exists solely for ApolloFederation" }) city?: Nullable; @Field({ nullable: true, deprecationReason: "field exists solely for ApolloFederation" }) region?: string; @Field({ nullable: true, deprecationReason: "field exists solely for ApolloFederation" }) country?: string; } ``` ### Reference resolver ```typescript @ResolveReference() async resolveReference(ref: MapboxBoundaryTypeReference): Promise> { // here, we search for data // this could return null (technically, a promise that resolves to null) } ``` ## subgraph 2 (using `"@apollo/subgraph": "2.3.0"`) ### MapboxBoundaryType (unresolvable) ```typescript @ObjectType("MapboxBoundary") @Directive('@key(fields: "city region country zipcode", resolvable: false)') export class MapboxBoundaryType { @Field((type) => String, { nullable: true }) city?: Nullable; @Field({ nullable: true }) region?: string; @Field({ nullable: true }) country?: string; @Field({ nullable: true }) zipcode?: string; } ``` ### field resolver ```typescript @ResolveField(() => MapboxBoundaryType, { nullable: true }) boundary(@Parent() { city, region, country, zip }: T): MapboxBoundaryType { return { city, region, country, zipcode: zip ?? undefined }; } ```
spencersteers commented 1 year ago

Seeing the same thing in "@apollo/gateway": "2.3.1".

pcmanus commented 1 year ago

The short answer is that the change was kind of on purpose, but I didn't fully realized the concrete consequence of it. And I agree it's not great so I've created #2380 which (more or less) remove the error (more or less because in practice the error is moved into the response "extensions"; read on for the reasoning behind that).

As for the longer answer, let me explain what's happening here, both for posterity and because that could be useful. As in fact, the error here is kind of correct, and to some extent it was previous versions that were incorrect of not including it.

If you look at the error message, the issue is in fact not about boundary being null. The problem is with MapboxBoundary.featureId, which is non-nullable, and the error is meant to explain why boundary ends null in the response.

In particular, the reason that boundary is null in the response is not due the fact that the first subgraph resolveReference returns null, at least not directly. In fact, if you were to make the featureId and layer fields nullable, then that same query with the same resolvers (still returning null) would not return boundary: null anymore. It would return a boundary object populated with its key fields (city, region, country and zipcode) and with both featureId and layer set to null.

What is happening inside of the gateway is that the 2nd subgraph is first queried and return some boundary object with its key fields. Then, the 1st is queried using that key. But when that 1st subgraph returns null (due to the resolveReference return value), the actual behaviour of the gateway is to "ignore" that response (not to set boundary to null in particular). And so internally, boundary is still the object from the initial fetch to the 2nd subgraph. This is why, if featureId and layer were nullable, you would just get that object back.

But featureId is non-nullable, and so it would be invalid to send back as response a boundary object where featureId is null. As such, the gateway (at least the 2.3 version) just follows the GraphQL spec: it adds an error to the response and propagates the null to the parent field. That parent field is boundary, which is nullable, and so the propagation stops there and that is why the response ultimately has boundary: null (and an error that explains that it is null due to "bubbling up" the nullability error on featureId).

So, technically speaking, it is more spec compliant to have an error than to not have it. And the reason previous versions of the gateway weren't including the error was that, for some historical reasons, errors generated during the processing of the final response (as the one here) were never included in said response. Meaning that a whole class of errors (any errors resulting of the post-processing of subgraph responses) was silently swallowing by the gateway. The intent of 2.3 was to fix that.

But with all that said, I do understand why the prior behaviour of not having any error in this particular case was actually convenient: it ended up behaving "as if" returning null from the 1st subgraph resolveReference allowed to make the entity null in the response and I can see how one would want that.

And so, long story short, we're going to revert the change of including "post-processing errors" in the response errors, at least for now. We're just not going to fully revert to completely swallowing those errors, because while in this specific case those errors may feel undesirable, there is plenty of other case where other kinds of post-propcessing errors are genuinely a big help for debugging issues, so we'd like to surface them somehow.

And so the attached PR puts those "post-processing errors" into the response "extensions" (under response.extensions, not response.errors), which as it happens is also what the Apollo router already does. That way, those errors will not show up as errors in any client/tooling.

I hope this will solve your issue.

spencersteers commented 1 year ago

@pcmanus Thanks for the quick turnaround. I had a feeling this was the underlying cause of the issue. Appreciate you taking the time to explain whats going on.

jfairley commented 1 year ago

Thanks @pcmanus for the very thorough response. I see it is a sticky situation.

I like your proposed change, and I'll follow #2380 to see what comes of it.

pcmanus commented 1 year ago

For the record, we're going ahead with #2380 for now, which will be part of 2.3.2, but I do feel it's ultimately a bit of a hack (as explained above, this move all post-processing errors into response "extensions", and while it's strictly better that the older behaviour of not surfacing them at all, those error still ought to be "normal" errors in most cases). I think the underlying longer term issue is that we may be lacking flexibility in our rules around the handling of nullability: if there was a direct way for a subgraph to nullify an entity directly in the output, then one would be able to achieve the desired result here without having any error triggered internally. But how to improve that is a discussion for another time.