DataDog / dd-trace-js

JavaScript APM Tracer
https://docs.datadoghq.com/tracing/
Other
640 stars 303 forks source link

Error Hooks for tracer.trace() and tracer.wrap() #2192

Open benjohnson opened 2 years ago

benjohnson commented 2 years ago

Hello! We've got some custom instrumentation that uses tracer.wrap(). One of the challenges we've had is that we want to avoid setting the error tag for client-side errors. To fix this, we've ended up basically forking the code for tracer.wrap so that we can control whether or not tracer.trace adds the error tags.

Would you be open to a PR that adds some kind of hook or option to control errors?

rochdev commented 2 years ago

Would you be open to a PR that adds some kind of hook or option to control errors?

Definitely, we're always open to contributions! However, can you also provide some code snippet that demonstrates better your use case? It's possible there is an existing way to handle this. For example we handle this in our own auto-instrumentation for HTTP servers by ignoring 4xx (and thus client) errors.

benjohnson commented 2 years ago

In your HTTP server config, you control how you finish the span, which allows you to call addStatusError and do the check.

If you're using the out of the box tracer.wrap() the finishing of the span happens inside tracer.trace() with the call to addError in private scope.


Here's our use case:

We want an additional named span to occur for every function call in a file. This means we're calling code like this:

      for (const [fnName, fn] of Object.entries(service)) {
        if (typeof fn === 'function') {
          service[fnName] = wrap(
            'cookiemonster',
            { resource: `service.cookiemonster.${fnName}`, tags },
            fn
          );
        }
      }

With this basic use of wrap, the wrap instrumentation is responsible for finishing the spans, so by the time we see the result of this call, we've already marked any lower level errors as having the error tag. There's no way for us to "undo" it.

I can't think of a way for us to modify this without being able to control the tracer's addError method.

In our internal code, we've basically copied what wrap and trace are doing and are using the lower level functions, but we're seeing some weird behaviour in DD. For instance, when we turned this on our downstream service tracing dropped to 0 (Update: We think we found the bug that caused this, but it's the kind of thing that makes me nervous about maintaining our own implementation.):

image

I understand the desire to not overcomplicate settings, but I also would really like a way to control which things we mark as errors so that we can continue to benefit from your changes and updates to wrap.

Thanks for considering.

rochdev commented 2 years ago

Thanks for the detailed explanation. From your proposed solutions I think the hook would make the most sense in this case. Basically the hook would receive the span and the wrapped function arguments as parameters and would be called right before the span is finished, allowing you to detect if errors should be removed from the span. It could potentially also receive the return value if needed, potentially as the second argument before the parameters. Do you think that would work for your use case?

Something like this:

const options = {
  resource: `service.cookiemonster.${fnName}`,
  tags,
  after: (span, result, ...params) => {
    if (!isError(result, params)) {
      // unfortunately all 3 tags are needed because `trace` and `wrap` are not using the `error` tag
      span.addTags({
        'error.msg': null,
        'error.type': null,
        'error.stack': null
      })
    }
  }
}

service[fnName] = wrap('cookiemonster', options, fn)

We're in the process of refactoring the tracer and all errors will use a single tag once that's done, so this could be simplified in the next few weeks with a single span.setTag('error', false).

As a workaround while this is not yet implemented, it should be possible to skip the error with something like this:

const options = { resource: `service.cookiemonster.${fnName}`, tags }

service[fnName] = function (...args) {
  // assuming it returns a promise, but callbacks and sync calls are also supported
  const promise = fn.apply(this, ...args)

  trace('cookiemonster', options, () => {
    return promise.catch(() => {}) // swallow the error but only for the tracer
  })

  return promise
}
benjohnson commented 2 years ago

Yep, I do think that this would work for us. Is it better to wait until you've had a chance to make the changes you want to make to the tracer before putting together a PR?

rochdev commented 2 years ago

@benjohnson It really depends how this is implemented, but if it would be with a hook I think it would be fine either way since only the body of the hook would change, not the hook itself. However, it might be better to do this through diagnostics_channel which we're moving a lot of things to right now, and if that's used it might be better to wait for the new code to land since it will standardize the naming conventions for the channel events.

yarinvak commented 1 year ago

bumping it, we will also appreciate this feature