DataDog / dd-trace-rb

Datadog Tracing Ruby Client
https://docs.datadoghq.com/tracing/
Other
310 stars 375 forks source link

The Redis integration can't distinguish final errors from retried errors #3820

Open byroot opened 4 months ago

byroot commented 4 months ago

Ref: https://github.com/redis-rb/redis-client/issues/119

There is a bit of an ongoing problem with users of dd-trace-rb, the instrumentation of the Redis gem has a fairly different behavior whether you are using the 4.x version or 5.x version.

More detail at https://github.com/redis-rb/redis-client/issues/119#issuecomment-1829703792, but in short with the 4.x version the instrumentation is hooked above the code in charge of retrying on error, while with 5.x it's below.

Because of this users upgrading from 4.x to 5.x see a stream of errors that aren't new at all, just weren't reported, and think the 5.x version is bugged.

This is no fault of dd-trace-rb, it just use the official hook points declared by redis-client. As the maintainer of redis-client I'd like to find a solution to this, and totally willing to extend or evolve the hook points to allow dd-trace-rb and similar projects to be able to ignore retried errors.

But to help with that I'd first like to know if there is a precedent to this, is this done for other integrations? Would dd-trace-rb need to just not see errors, or need some kind of boolean telling whether the error is final, etc. As this would help me drive what the API change would look like.

@ivoanjo @TonyCTHsu please let me know your thougths.

ivoanjo commented 4 months ago

Thanks for letting us know, definitely sucks that we're breaking folks and adding more work for you :sweat_smile:

Let me sync with the other folks and we'll get back to you asap.

byroot commented 4 months ago

definitely sucks that we're breaking folks and adding more work for you

It really isn't that bad, it's mostly confusion I think.

marcotc commented 3 months ago

Just an FYW that we are still looking into it; just trying to get the right people together with all the summer PTOs.

byroot commented 3 months ago

No worries. That's much appreciated.

marcotc commented 3 months ago

Hey @byroot, I had a chat internally in Hughes what we came up with:

  1. We think that visibility at a level that abstracts away any automatic retries is the desirable default. We had this behavior when we instrumented version 4.x, but we made a mistake of changing that behavior for 5.X.
  2. Inspecting the middleware call site today, we think that it makes sense that it is invoked on each individual request, even on retried request, because arbitrary request modifications can be necessary on an individual request level. For observability, we are not interested in that signal, but we see the value in theory.

What we re thinking of doing is going back to the original monkey-patching for 5.x, that same that we do for 4.x.

We don't think that is necessarily required for the redis-client gem to support callbacks at the level we desire which today would translate to "one high-level Redis API call to one instrumentation signal".

We are happy to expand a conversation as well and hear any feedback.

byroot commented 3 months ago

What we re thinking of doing is going back to the original monkey-patching for 5.x, that same that we do for 4.x.

So you wouldn't instrument raw redis-client usage at all?

marcotc commented 1 month ago

So you wouldn't instrument raw redis-client usage at all?

This is correct. Given we rather report Redis requests at a high-level, looking at the code in redis and redis-client, we wouldn't instrument at the redis-client level.

byroot commented 1 month ago

Note that redis-client can be used standalone, it's not exclusively a dependency of redis.