connectrpc / connect-es

The TypeScript implementation of Connect: Protobuf RPC that works.
https://connectrpc.com/
Apache License 2.0
1.34k stars 76 forks source link

Add support for OpenTelemetry #523

Open timostamm opened 1 year ago

timostamm commented 1 year ago

Is your feature request related to a problem? Please describe. OpenTelemetry is often used with gRPC to collect metrics, logs, and traces and analyze software performance and behavior. Integrations for connect-go and @grpc/grpc-js are available, but there is no equivalent for connect-es. This can be a roadblock for running services with connect-es in infrastructure that requires good observability.

Describe the solution you'd like A package that makes it easy to add OpenTelemetry tracing and metrics collection to clients and servers.

Describe alternatives you've considered Interceptors provide the necessary hooks, but there is no equivalent for servers yet, and it would be convenient to have a tested and well-supported integration available. It could utilize the existing OpenTelemetry JavaScript Client.

Additional context opentelemetry-js - OpenTelemetry JavaScript Client
@opentelemetry/instrumentation-grpc - experimental intrumentation for @grpc/grpc-js
connect-opentelemetry-go - OpenTelemetry tracing and metrics for connect-go

fubhy commented 1 year ago

We've rolled our own basic connect-node interceptors for now. This lets us collect some generic metrics & tracing via OpenTelemetry and Sentry.io but also gives us a way to "enhance" the HandlerContext object with injected dependencies and implement generic caching wrappers (I think I showed some examples in one of the other issues in this repo before) and request/response validation based on & configured via grpc options.

We only needed UnaryRequests support so far, so this was rather simple in our case but I'd love for there to be native support for interceptors/middlewares for all these use-cases.

So from our perspective it would be great & strategically correct to start fleshing out a concept for interceptors/middlewares first. Once there's native support for these, I'd imagine the community to step in and provide all sorts of reusable & configurable middlewares like this one!

fubhy commented 1 year ago

Just a side note related to traces for those of us who are using sentry.io. Especially in case of streaming responses and long running requests or simply high throughput APIs with a lot of concurrent requests (frankly, any concurrent requests probably): Sentry does not yet fully utilize AsyncLocalStorage (node:async_hook), and instead uses the legacy (and long deprecated) domains api. This means that there's a lot of context leaking going on. We noticed this much more severely with connect-es because of a streaming use-case we have there.

Apparently sentry.io is working on this but it might take a while for them to get to the finish line: https://github.com/getsentry/sentry-javascript/issues/4071

StarpTech commented 1 year ago

Any reason why nobody is responding to this? @fubhy has great ideas, and a interceptors/middlewares approach is long overdue. How can I/we push it forward?

smaye81 commented 1 year ago

Hi @StarpTech. This is definitely on our radar. We are trying to finalize things for our upcoming v1 release, so that's been our main focus. After that though, we plan to begin addressing issues such as these. Stay tuned!

StarpTech commented 1 year ago

@smaye81 thank you for the update. It would be nice if the communication would be more open. I can't wait to try it out. :crossed_fingers:

koblas commented 1 year ago

I worked on this in two different ways. As an interceptor, that's the "proper" way but of course means that you have to make sure it's on the chain whenever you're creating clients. Or the opentelemetry way which monkey patches the code, I did start that implementation but got mired in poor documentation.

From my perspective the monkey patching version is the "correct" way given the NodeJS OpenTelemetry model :(

For those interested and partly for the purpose of code review, here's the interceptor that I'm using:

const tracingInterceptor: Interceptor = (next) => (req) => {
  const parentContext = context.active();
  const tracer = trace.getTracer("connectrpcInterceptor", "0.0.1");

  return tracer.startActiveSpan(
    `grpc.${req.service.typeName}.${[req.method.name](http://req.method.name/)}`,
    {
      kind: SpanKind.CLIENT,
      attributes: {
        url: req.url,
        [SemanticAttributes.RPC_SYSTEM]: "grpc",
        [SemanticAttributes.RPC_METHOD]: req.method.name,
        [SemanticAttributes.RPC_SERVICE]: req.service.typeName,
      },
    },
    parentContext,
    async (span) => {
      // This copies the current OpenTelemetry metadata into the request headers
      propagation.inject(context.active(), req.header, {
        set(hdrs, key, value) {
          hdrs.set(key, typeof value === "string" ? value : String(value));
        },
      });

      function finishSpan(err: unknown) {
        if (err === null || err === undefined) {
          span.setAttribute(SemanticAttributes.RPC_GRPC_STATUS_CODE, 0);
        } else if (err instanceof ConnectError) {
          if (err.code) {
            // span.setStatus(_grpcStatusCodeToSpanStatus(err.code));
            span.setAttribute(SemanticAttributes.RPC_GRPC_STATUS_CODE, err.code);
          }
          span.setAttributes({
            "grpc.error_name": [err.name](http://err.name/),
            "grpc.error_message": err.metadata.get("grpc-message") || err.message,
          });
        } else if (err instanceof Error) {
          span.recordException(err);
        } else {
          // Technically shouldn't happen
          span.setAttribute(SemanticAttributes.RPC_GRPC_STATUS_CODE, 999);
          span.setAttributes({
            "grpc.error_name": "non-connect error",
            "grpc.error_message": String(err),
          });
        }
      }

      try {
        const res = await next(req);

        finishSpan(null);

        return res;
      } catch (err: unknown) {
        finishSpan(err);

        throw err;
      } finally {
        span.end();
      }
    },
  );
};