apollographql / router

A configurable, high-performance routing runtime for Apollo Federation 🚀
https://www.apollographql.com/docs/router/
Other
813 stars 271 forks source link

Enable distributed tracing for subgraphs using OpenTelemetry traceparent header #6028

Closed JustinMGaiam closed 1 month ago

JustinMGaiam commented 1 month ago

Is your feature request related to a problem? Please describe.

There is no OpenTelementry configuration to append the traceid to subgraph requests.

Describe the solution you'd like

Add configuration to the telemetry section of the route configuration which appends traceid to the traceparent header in SubGraph requests

Describe alternatives you've considered

Below you will find a Rhai script which is being tested to provide this functionality to implement adding the traceid to all subgraphs request.

The script is design to support NewRelic distributed tracing, but it is implemented using OpenTelemetry patterns. https://newrelic.com/blog/best-practices/distributed-tracing-guide

Additional context

// Process the request
fn process_subgraph_request (request) {
  try {
    // https://www.apollographql.com/docs/router/customizations/rhai-api/#accessing-a-traceid
    const id = `${traceid()}`;
    // Add the trace parent if it does yet exist
    if (!id.is_empty() && !request.subgraph.headers.contains("traceparent")) {
      request.subgraph.headers["traceparent"] = id; 
      // request.headers["tracestate"] = '';
    }
  } catch (err) {
    // log any errors
    log_error(err);
  }
}

// Ensure existence of header before processing
fn subgraph_service (service, subgraph) {
  const subgraph_request_callback = Fn("process_subgraph_request"); 
  service.map_request(subgraph_request_callback);
}
JustinMGaiam commented 1 month ago

After further investigation is looks like header propagation may be part of the telemetry setup provided via the yaml file. This is the full NewRelic setup which seems to provide this feature using the North America servers. If this YAML looks correct I can close this request.

telemetry:
  instrumentation:
    spans:
      mode: spec_compliant
  exporters:
    tracing:
      common:
        parent_based_sampler: true
        # (Optional) Set the service name to easily find metrics related to the apollo-router in your metrics dashboards
        service_name: "${env.NEW_RELIC_APP_NAME}"
      otlp:
        enabled: true
        # Temporality MUST be set to delta. Failure to do this will result in incorrect metrics.
        temporality: delta
        # Endpoint for your region.
        endpoint: https://otlp.nr-data.net
        protocol: grpc
        grpc:
          metadata:
            "api-key":
              - "${env.NEW_RELIC_API_KEY}"
      propagation:
        trace_context: true
    metrics:
      common:
        # (Optional) Set the service name to easily find metrics related to the apollo-router in your metrics dashboards
        service_name: "${env.NEW_RELIC_APP_NAME}"
      otlp:
        enabled: true
        # Temporality MUST be set to delta. Failure to do this will result in incorrect metrics.
        temporality: delta
        # Endpoint for your region.
        endpoint: https://otlp.nr-data.net:4317/v1/metrics
        protocol: grpc
        grpc:
          metadata:
            "api-key":
              - "${env.NEW_RELIC_API_KEY}"
JustinMGaiam commented 1 month ago

The final OpenShift / Kubernetes OpenTelemetry setup which works for distributed tracing including additional NewRelic cluster information is as follows. This is working on the following versions.

OpenShift: 4.16 NewRelic Operator: 5.0.88 Apollo Router: 1.56

telemetry:
  instrumentation:
    spans:
      mode: spec_compliant
  exporters:
    tracing:
      common:
        parent_based_sampler: true
        # (Optional) Set the service name to easily find metrics related to the apollo-router in your metrics dashboards
        service_name: "${env.NEW_RELIC_APP_NAME}"
        resource:
          environment.cluster_name: "${env.NEW_RELIC_METADATA_KUBERNETES_CLUSTER_NAME}"
          environment.container_image_name: "${env.NEW_RELIC_METADATA_KUBERNETES_CONTAINER_IMAGE_NAME}"
          environment.container_name: "${env.NEW_RELIC_METADATA_KUBERNETES_CONTAINER_NAME}"
          environment.namespace: "${env.NEW_RELIC_METADATA_KUBERNETES_NAMESPACE_NAME}"
          environment.node_name: "${env.NEW_RELIC_METADATA_KUBERNETES_NODE_NAME}"
          environment.pod_name: "${env.NEW_RELIC_METADATA_KUBERNETES_POD_NAME}"
          # (Optional) Set the service name to easily find metrics related to the apollo-router in your metrics dashboards
          service.name: "${env.NEW_RELIC_APP_NAME}"
      otlp:
        enabled: true
        # Temporality MUST be set to delta. Failure to do this will result in incorrect metrics.
        temporality: delta
        # Endpoint for your region.
        endpoint: https://otlp.nr-data.net
        protocol: grpc
        grpc:
          metadata:
            "api-key":
              - "${env.NEW_RELIC_API_KEY}"
      propagation:
        trace_context: true
    metrics:
      common:
        # (Optional) Set the service name to easily find metrics related to the apollo-router in your metrics dashboards
        service_name: "${env.NEW_RELIC_APP_NAME}"
        resource:
          environment.cluster_name: "${env.NEW_RELIC_METADATA_KUBERNETES_CLUSTER_NAME}"
          environment.container_image_name: "${env.NEW_RELIC_METADATA_KUBERNETES_CONTAINER_IMAGE_NAME}"
          environment.container_name: "${env.NEW_RELIC_METADATA_KUBERNETES_CONTAINER_NAME}"
          environment.namespace: "${env.NEW_RELIC_METADATA_KUBERNETES_NAMESPACE_NAME}"
          environment.node_name: "${env.NEW_RELIC_METADATA_KUBERNETES_NODE_NAME}"
          environment.pod_name: "${env.NEW_RELIC_METADATA_KUBERNETES_POD_NAME}"
          # (Optional) Set the service name to easily find metrics related to the apollo-router in your metrics dashboards
          service.name: "${env.NEW_RELIC_APP_NAME}"
      otlp:
        enabled: true
        # Temporality MUST be set to delta. Failure to do this will result in incorrect metrics.
        temporality: delta
        # Endpoint for your region.
        endpoint: https://otlp.nr-data.net:4317/v1/metrics
        protocol: grpc
        grpc:
          metadata:
            "api-key":
              - "${env.NEW_RELIC_API_KEY}"