feathersjs / feathers

The API and real-time application framework
https://feathersjs.com
MIT License
15.08k stars 750 forks source link

OpenTelemetry Tracing support #2300

Open deskoh opened 3 years ago

deskoh commented 3 years ago

OpenTelemetry is a cloud-native observability framework that supports tracing in a vendor-neutral manner.

Created an instrumentation proof-of-concept for tracing Feathers app (currently pinned to 5.0.0-pre.2). (See crow branch for v4).

Instrumentation works in a non-intrusive way by 'monkey-patching' methods in existing packages to inject tracing into the application.

For v5, it relies on patching collectLegacyHooks of @feathersjs/feathers which is not currently not officially exported.

(For v4, it patches processHooks of @feathersjs/commons.)

@daffl Possible to export collectLegacyHooks formally for v5 to allow some official / stable means to support OpenTelemetry Tracing?

It's an alternative to #2043 that offload the tracing visualization to tools like Zipkin or Jaeger.

image

daffl commented 3 years ago

Hm, instead of patching the internals (e.g. with collectLegacyHooks it wouldn't capture any of the new hooks) I believe it would be possible to provide a function that configures a tracing hook as the first one:

export function openTelemetry () {
  return (app: Application) => {
    app.hooks([
      async (context: HookContext, next: NextFunction) => {
        startTrace();
        await next();
        endTrace();
      }
    ]);
  }
}

Now we can configure it with

app.configure(openTelemetry());

As long as this is registered as one of the first plugins (before any other hooks) it should capture everything (except the event emitting since that will always be the very first one).

deskoh commented 3 years ago

Yeap. Tested it could work.

Downside is I would need to add the hooks if I wanted trace events when app / service hooks before / after hooks completes.

daffl commented 3 years ago

I believe this would still work. The hook execution order is

So as long as it is registered first, a global app hook should capture everything.

deskoh commented 3 years ago

Any guidance how this could be done in the example above using a single app.configure() call?

+ Hooks can be added here using `app.hooks([...])` to trace start and end of service call
App hooks (app.hooks([]))
App method hooks (app.hooks({ methodName }))
Legacy app hooks (before/after) (app.hooks({ before, after }))
+ Add Hook 1 here to emit trace event 'calling service before hooks` and `completed service after hooks`
Service hooks (service.hooks([]))
Service method hooks (service.hooks({ methodName }))
Legacy service hooks (before/after) (service.hooks({ before, after }))
+ Add Hook 2 here to emit trace event 'calling service` and `completed service`
daffl commented 3 years ago

I have a feeling this may be related to https://github.com/feathersjs/feathers/issues/2074 unless you pass a tracer via params to every call.

deskoh commented 3 years ago

OpenTelemetry API already handle 'context propagation'. Specifically, wrapping calls with context.with will allow access to callee to have access parent trace span.

https://github.com/deskoh/feathers-opentelemetry-demo/blob/7fa39d70e9b52b4cf2fa5efc14844fb81067ee68/src/instrumentation-feathers.ts#L55

The appMiddleware and serviceMiddleware are the Hook 1 and Hook 2 required to be 'injected' in correct order

https://github.com/deskoh/feathers-opentelemetry-demo/blob/7fa39d70e9b52b4cf2fa5efc14844fb81067ee68/src/instrumentation-feathers.ts#L59-L64

https://github.com/deskoh/feathers-opentelemetry-demo/blob/7fa39d70e9b52b4cf2fa5efc14844fb81067ee68/src/instrumentation-feathers.ts#L65-L70

See the section Accessing the current span for more details.

daffl commented 3 years ago

That context propagation won't work for nested service calls though right?

I wonder if the ideal way for tracing a single service call would be to add an instrumentation option directly to https://github.com/feathersjs/hooks

deskoh commented 3 years ago

Yes it does! There is a deliberate call to users.find service in messages.find to trigger child trace.

If you run up Zipkin, you would be able to see the correct the parent span ID of users.find (the parent-child relationship also depicted by the indented green bar on the left of the span).

The annotations on the right will show the emitted trace events.

I would think the crux is to insert correct hooks in the hooks array in the correct order, which is not possible upfront, as user might add hooks during runtime. The best location I could think of is to patch collectMiddleware of FeathersHookManager, which is not exported. 🤔

jchamb commented 3 years ago

@deskoh My org uses Datadog, but figured be worth sharing how we wrapped our services using a mixin as some what alternative to hooks.

const ddTrace = require('dd-trace');

module.exports = function (app) {
  if (process.env.NODE_ENV === 'test') return;
  const tracer = ddTrace.init({
    tags: ['app:fx-' + process.env.NODE_ENV],
    env: process.env.NODE_ENV,
    logInjection: true,
  });

  app.mixins.push((service, path) => {
    Object.keys(service.methods).forEach((method) => {
      if (service[method]) service[method] = tracer.wrap(`feathers`, tracer.wrap(`service.${path}`, service[method]));
    });
  });
};

Def not a perfect solution still but does get the flow stacked correctly for us in the stack traces.

deskoh commented 3 years ago

@jchamb thanks! I saw from Feathers v5 mixins would no longer be supported. Would this still work?

jchamb commented 3 years ago

@deskoh I haven't played with the pre v5's yet, however looking at the migration guide https://dove.docs.feathersjs.com/guides/migrating.html#uberproto I see an example still calling a mixin. @daffl would have to confirm any plans of that also being removed.

You could use that example to start and stop your spans around the method calls as a starter. Like I said previously this isn't a perfect solution as it would still only trace actual service method calls not including the hooks. Our apps flow is pretty linear so for now this is good enough for us.

Between the new hooks and wrapping service calls, def interested to see what you come up with here!

deskoh commented 3 years ago

@daffl I tried the service mixin documented here and got the following error when making a REST call.

Error: Can not create context for method find
    at Function.createContext (D:\feathers-opentelemetry-demo\node_modules\@feathersjs\feathers\src\hooks\index.ts:22:11)
    at D:\feathers-opentelemetry-demo\node_modules\@feathersjs\express\src\rest.ts:70:19
    at Generator.next (<anonymous>)
    at D:\feathers-opentelemetry-demo\node_modules\@feathersjs\express\lib\rest.js:8:71
    at new Promise (<anonymous>)
    at __awaiter (D:\feathers-opentelemetry-demo\node_modules\@feathersjs\express\lib\rest.js:4:12)
    at D:\feathers-opentelemetry-demo\node_modules\@feathersjs\express\src\rest.ts:58:52
    at D:\feathers-opentelemetry-demo\node_modules\@feathersjs\express\src\rest.ts:44:29
    at Generator.next (<anonymous>)
    at D:\feathers-opentelemetry-demo\node_modules\@feathersjs\express\lib\rest.js:8:71 {
  code: 500
}

Turns out the app mixin runs last and would override the service methods, hence it wouldn't have a createContext function.

deskoh commented 3 years ago

@deskoh I haven't played with the pre v5's yet, however looking at the migration guide https://dove.docs.feathersjs.com/guides/migrating.html#uberproto I see an example still calling a mixin. @daffl would have to confirm any plans of that also being removed.

You could use that example to start and stop your spans around the method calls as a starter. Like I said previously this isn't a perfect solution as it would still only trace actual service method calls not including the hooks. Our apps flow is pretty linear so for now this is good enough for us.

Between the new hooks and wrapping service calls, def interested to see what you come up with here!

@jchamb Managed to get tracing using mixins working!

https://github.com/deskoh/feathers-opentelemetry-demo/blob/dove-mixins/src/opentelemetry.ts

jchamb commented 3 years ago

Awesome @deskoh thanks for sharing that!

dyllandry commented 1 year ago

@deskoh Is there a way to get nested spans to work in feathersjs v4? I noticed you have nested spans working in the v5 example, but not in the v4 example.

deskoh commented 1 year ago

@deskoh Is there a way to get nested spans to work in feathersjs v4? I noticed you have nested spans working in the v5 example, but not in the v4 example.

how does your current trace look like?