Subgraph transport exception for fields below an entity cause entire entity to be nulled

tobias-tengler commented 11 months ago

Is there an existing issue for this?

[X] I have searched the existing issues

Product

Hot Chocolate

Describe the bug

If two or more subgraphs expose an entity via a top-level resolver like productById and one of the subgraphs fails to provide their part of the entity, the entire entity resolver result is nulled, even if the top-most fields below Product requested from the faulty subgraph are nullable.

For a query like this:

query Product {
  productById(id: "UHJvZHVjdAppMQ==") {
    # Subgraph: Product
    name
    # Subgraph: Reviews
    reviews { # nullable
      body
    }
  }
}

My expectation would be a response like this, if the reviews Subgraph can not be reached:

{
  "errors": [
    // Errors for each top-most field that couldn't be resolved from the subgraph
    //(in this case just `reviews`)
  ],
  "data": {
    "productById":{
      "name": "iPhone",
      "reviews": null
    }
  }
}

The actual result you'd get is:

{
  "errors": [
    {
      "message": "Internal Execution Error"
    }
  ],
  "data": {
    "productById": null
  }
}

This is pretty bad, since a single field being requested from a faulty subgraph could take down an entire page! The behavior for top-level query fields being resolved from different subgraphs is already correct.

Steps to reproduce

I've created an integration test showcasing this:

[Fact]
public async Task Test()
{
    // arrange
    using var demoProject = await DemoProject.CreateAsync();

    var fusionGraph = await new FusionGraphComposer(logFactory: _logFactory).ComposeAsync(
        new[]
        {
            demoProject.Products.ToConfiguration(ProductsExtensionSdl),
            demoProject.Reviews.ToConfiguration(ReviewsExtensionSdl),
        },
        new FusionFeatureCollection(FusionFeatures.NodeField));

    var executor = await new ServiceCollection()
        .AddSingleton(demoProject.HttpClientFactory)
        .AddSingleton(demoProject.WebSocketConnectionFactory)
        .AddSingleton<IConfigurationRewriter, CustomRewriter>()
        .AddFusionGatewayServer()
        .ConfigureFromDocument(SchemaFormatter.FormatAsDocument(fusionGraph))
        .BuildRequestExecutorAsync();

    var request = Parse(
        """
        query Product {
          productById(id: "UHJvZHVjdAppMQ==") {
            # Subgraph: Product
            name
            # Subgraph: Reviews
            reviews {
              body
            }
          }
        }
        """);

    // act
    await using var result = await executor.ExecuteAsync(
        QueryRequestBuilder
            .New()
            .SetQuery(request)
            .Create());

    // assert
    var snapshot = new Snapshot();
    CollectSnapshotData(snapshot, request, result, fusionGraph);
    await snapshot.MatchAsync();

    Assert.Null(result.ExpectQueryResult().Errors);
}

private class CustomRewriter : ConfigurationRewriter
{
    protected override ValueTask<Metadata.HttpClientConfiguration> RewriteAsync(
        Metadata.HttpClientConfiguration configuration,
        CancellationToken cancellationToken)
    {
        if (configuration.ClientName == "Reviews")
        {
            // This simulates the Reviews subgraph not being reachable
            return base.RewriteAsync(
                configuration with { EndpointUri = new Uri("http://client") },
                cancellationToken);
        }

        return base.RewriteAsync(configuration, cancellationToken);
    }
}

Relevant log output

No response

Additional Context?

No response

Version

14.0.0-p.15

tobias-tengler commented 11 months ago

@michaelstaib Looking at the query plan for this test query, the calls to the subgraphs are also not being parallelized, even though they could/should:

{
  "document": "query Product { productById(id: \u0022UHJvZHVjdAppMQ==\u0022) { name reviews { body } } }",
  "operation": "Product",
  "rootNode": {
    "type": "Sequence",
    "nodes": [
      {
        "type": "Resolve",
        "subgraph": "Reviews",
        "document": "query Product_1 { productById(id: \u0022UHJvZHVjdAppMQ==\u0022) { reviews { body } __fusion_exports__1: id } }",
        "selectionSetId": 0,
        "provides": [
          {
            "variable": "__fusion_exports__1"
          }
        ]
      },
      {
        "type": "Compose",
        "selectionSetIds": [
          0
        ]
      },
      {
        "type": "Resolve",
        "subgraph": "Products",
        "document": "query Product_2($__fusion_exports__1: ID!) { productById(id: $__fusion_exports__1) { name } }",
        "selectionSetId": 1,
        "path": [
          "productById"
        ],
        "requires": [
          {
            "variable": "__fusion_exports__1"
          }
        ]
      },
      {
        "type": "Compose",
        "selectionSetIds": [
          1
        ]
      }
    ]
  },
  "state": {
    "__fusion_exports__1": "Product_id"
  }
}

So the test I wrote can also be used as a reproduction to the issue my colleague Christian S. reported to you in DMs.

michaelstaib commented 11 months ago

We have that on the internal backlog ... it will be fixed with the work we are doing.

ChilliCream / graphql-platform