Using `IntrospectAndCompose` with High Availability Micro Services

Expansion of issue https://github.com/apollographql/federation/issues/349#issuecomment-1104128473

We are trying to use Apollo Federation with AWS services (i.e. AppSync) and have the following constraints that might apply to a lot of other companies.

IAM Support for Apollo Studio

We cannot use Apollo Studio because all of our services are created and authenticated using AWS IAM. It would be nice if we could give Apollo Studio an ID key and Secret from an IAM Role that would be used to authenticate all of our requests. Right now we do that manually like so:

export default class AuthenticatedDataSource extends RemoteGraphQLDataSource {
  /**
   * Adds the necessary IAM Authorization headers for AppSync requests
   * @param request The request to Authorize
   * @returns The headers to pass through to the request
   */
  private async getAWSCustomHeaders(request: GraphQLRequest): Promise<{
    [key: string]: OutgoingHttpHeader | undefined;
  }> {
    const { http, ...requestWithoutHttp } = request;

    if (!http) return {};

    const url = new URL(http.url);

    // If the graph service is not AppSync, we should not sign these request.
    if (!url.host.match(/appsync-api/)) return {};

    const httpRequest = new HttpRequest({
      hostname: url.hostname,
      path: url.pathname,
      method: 'POST',
      headers: {
        Host: url.host,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify(requestWithoutHttp),
    });

    const signer = new SignatureV4({
      region: 'us-east-1',
      credentials: defaultProvider(),
      service: 'appsync',
      sha256: Sha256,
    });

    const signedRequest = await signer.sign(httpRequest);

    return signedRequest.headers || {};
  }

  /**
   * Customize the request to AppSync
   * @param options The options to send with the request
   */
  public async willSendRequest({ request }: GraphQLDataSourceProcessOptions) {
    const customHeaders = await this.getAWSCustomHeaders(request);

    if (customHeaders)
      Object.keys(customHeaders).forEach((h) => {
        request.http?.headers.set(h, customHeaders[h] as string);
      });
  }
}

`IntrospectAndCompose` is all or nothing

Right now, the IntrospectAndCompose.initialize() method fails completely if even one service has a network timeout, which makes it almost impossible to use in production scenarios. For each service we add to our gateway, we increase the likelihood of a network error that cancels the entire process inevitably causing downtime or CI/CD failures.

To solve this, it would be rather easy to have loadServicesFromRemoteEndpoint() process schema fetching on a per-service basis. This could be hyper-efficient by wrapping dataSource.process() with a retry counter and retrying 5xx errors. That way the user can choose how many times they want to retry before IntrospectAndCompose fails altogether and rolls back.

Right now we are manually adding retries around the entirety of IntrospectAndCompose but as we add more services, this becomes really inefficient (I.E., if we have 150 services and service 148 fails, we still need to re-fetch services 1 through 147 on the next attempt).

Central Caching Schema Files

This isn't something that necessarily needs to be done by Apollo, but is something that is required for microservices. Our team currently uses S3 to cache a schema file since in our case we can be relatively confident that it will not change without the services being redeployed. The first (and sometimes possibly second) ECS container that comes online builds it's own schema using IntrospectAndCompose and then stores the cached file with a unique per-deployment ID that other service can use when they scale to fetch the cached schema.

Hi @Borduhh 👋 re: IntrospectAndCompose and central caching schema files - we generally recommend shifting left on composition to get it out of each gateway's runtime in production and moved into your build pipeline to generate a single static supergraph schema that can be deployed to each gateway. This helps with a variety of things as below.

In general, once a given subgraph (fleet) is available to serve an updated schema, it's published to the schema registry using rover subgraph publish which can accept the output of rover subgraph introspect.

This can be done with a form of rover subgraph introspect | rover subgraph publish from:

CI/CD pipeline deployment script
post-deploy analysis job - e.g. if using something like Argo Rollouts
manually after your subgraph deployment is complete if not using automated deployments

See the Federation docs for details

⚠️ We strongly recommend against using IntrospectAndCompose in production. For details, see Limitations of IntrospectAndCompose.

The IntrospectAndCompose option can sometimes be helpful for local development, but it's strongly discouraged for any other environment. Here are some reasons why:

Composition might fail. With IntrospectAndCompose, your gateway performs composition dynamically on startup, which requires network communication with each subgraph. If composition fails, your gateway throws errors and experiences unplanned downtime. With the static or dynamic supergraphSdl configuration, you instead provide a supergraph schema that has already been composed successfully. This prevents composition errors and enables faster startup.

Gateway instances might differ. If you deploy multiple instances of your gateway while deploying updates to your subgraphs, your gateway instances might fetch different schemas from the same subgraph. This can result in sporadic composition failures or inconsistent supergraph schemas between instances. When you deploy multiple instances with supergraphSdl, you provide the exact same static artifact to each instance, enabling more predictable behavior.

What you have with the Gateway willSendRequest looks right, and you could do something similar in your pipeline deployment script, generating the AWS AuthV4 headers and passing them to rover subgraph introspect --header.

See:

https://github.com/apollographql/rover/issues/1338 for a proposed DX enhancement for this, but using rover subgraph introspect --header should work for your scenario today.

apollographql / federation