DataDog / dd-trace-rb

Datadog Tracing Ruby Client
https://docs.datadoghq.com/tracing/
Other
308 stars 375 forks source link

ActiveRecord SQL spans don't have the same service name as Instantiation spans #3359

Open dannyrife opened 10 months ago

dannyrife commented 10 months ago

Current behaviour ActiveRecord SQL calls reflect the configured service name, but instantiations do not

The difference is clear comparing the contrib code: sql gets the settings, instantiation does not: https://github.com/DataDog/dd-trace-rb/blob/master/lib/datadog/tracing/contrib/active_record/events/sql.rb#L34 vs https://github.com/DataDog/dd-trace-rb/blob/master/lib/datadog/tracing/contrib/active_record/events/instantiation.rb#L32

Expected behaviour Since this is part of the active record instrumentations, the service name should match the other instrumentation for active record spans.

Steps to reproduce Instrument active record and create an instantiation something like below should make something that creates instantiations

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true
end

Environment

benoittgt commented 9 months ago

Hello @dannyrife

I took a look at your issue but there is two comments in specs that question myself. I am wondering if it was intended.

https://github.com/DataDog/dd-trace-rb/blob/a6f2af005202ba806e52448c86bf2e28c37f04e9/spec/datadog/tracing/contrib/rails/database_spec.rb#L85

and

https://github.com/DataDog/dd-trace-rb/blob/a6f2af005202ba806e52448c86bf2e28c37f04e9/spec/datadog/tracing/contrib/rails/database_spec.rb#L113

But the comment could also be outdated when I see the span.service value at the time.

https://github.com/DataDog/dd-trace-rb/commit/f64a90c88369dad6bcf6109524f52a10361033fc#diff-480dabeac4cb83aeebfd219880fbbaafff3c0b3a83cef7b2fa0e3819e3b516a5R60-R61

Here is a draft proposal: https://github.com/DataDog/dd-trace-rb/pull/3383

TonyCTHsu commented 9 months ago

👋 @dannyrife, @benoittgt , I believe it is behaving as expected, sinceactive_record.instantiation span is considered to belong to the application service, which mean it is the same service defined for your application.

https://docs.datadoghq.com/tracing/glossary/#services

dannyrife commented 9 months ago

@TonyCTHsu So these would be active record style objects that have no way to instrument a custom service name? When we instrumented the control code for active record contrib code why is the expectation that everything in that object isn't mapped the same? This seems pretty inconsistent, enough that it was brought to our attention by devs noticing it looked wrong.

p-datadog commented 9 months ago

Hi @dannyrife,

I have been investigating this issue and the related #3184 and #3383.

To give a little more detail on @TonyCTHsu’s comment, the service name is intended to describe the application that runs the code in question, not a particular component of the application. The component should be described by the span tags of “component” and “operation”, which are “active_record” and “instantiation”, respectively, for the AR instantiation instrumentation.

Specifying another service name for instantiation operation would result in a new service being created (as shown e.g. by the Service Catalog) which wouldn’t have much content, and conversely, the instrumentation of the instantiation would not completely appear under the service which is the application itself.

Attached is a screenshot of a Rails request showing both an SQL query (sqlite.query specifically) and an active_record.instantiation:

inst

The corresponding configuration is as follows, with ‘foo’ as custom service name for the SQL query:

Datadog.configure do |c|
    c.service = 'hello-world-oleg'

  c.tracing.instrument :active_record, service_name: 'foo'
end

The SQL query span represents time spent in the database. It is not 100% accurate because the database client library is instrumented instead of the database server, so part of the “database” time is the time it took for the client to send the query to the database and then to receive and parse the results.

The instantiation operation, in contrast, is completely performed by the application, and therefore should have the application’s service name. For this reason, the service name that is overridden for the SQL query operation is not propagated to the instantiation operation. The instantiation operation is not actually a part of the SQL query (and you can see that in the flamegraph attached), it just so happens that both operations are instrumented by the “ActiveRecord integration”.

Does this make sense?

ozydingo commented 9 months ago

I brought this up in #3184 and can add an additional perspective. In our case:

In other words, for a custom service name "foo", we had spans of the following operations, included via rails instrumentation, showing up as expected under the service name foo

and then one operation showing up under, in our case, the service name threeplaymedia_app3

We are currently working around this limitation with the monkey-patch to Datadog::Tracing::Contrib::ActiveRecord::Events

        def process(span, event, _id, payload)
          span.service = "foo"
          super
        end
ozydingo commented 9 months ago

@TonyCTHsu thanks for the look! To address in your specific comment

I believe it is behaving as expected, sinceactive_record.instantiation span is considered to belong to the application service, which mean it is the same service defined for your application.

I think this is only right if you are using the default instrumentation configuration. However, if you are using the supported and documented feature to use a custom service name that is not your application's name, what you get is not the same service as defined for the rest of the application. This is the problem that #3184 attempts to fix. I think #3383 actually misses the mark a bit as it ties the service name to the database service which as you point out it not really correct, as opposed to the configured service name. Let me know if this makes sense!

ozydingo commented 9 months ago

Also, for completeness, we initially attempted to see if using the service_name configuration on the active_record, as opposed to rails, instrumentation option would be respected:

c.tracing.instrument :active_record, service_name: "foo"

vs

c.tracing.instrument :rails, service_name: "foo"

however this has no different effect than adding service_name to the rails instrument, as can be seen in the linked PRs.

dannyrife commented 9 months ago

@ozydingo Thanks for sharing your use case. It's very similar to ours and we're having users asking why the active-record spans are mis-labeled. It took some digging to see it was because of this difference in the instantiation logic.

DD Support staff recommended we use the custom service names since without that all components get mapped to a singular service in DD and this doesn't work well with the granularity allowed in the error tracking service (for instance you can't mute errors for only one services active-record component)

The issue is so many of the DD APM logic only works with service, version and env tags. So the extra granularity in the span tags isn't always helpful if you can't control the service name. This issue breaks the contract with manual service name instrumentation.

marcotc commented 9 months ago

@ozydingo in your use case, would it be desirable that active_record.instantiation span's service name matches your global c.service?

delner commented 9 months ago

I think this is only right if you are using the default instrumentation configuration. However, if you are using the supported and documented feature to use a custom service name that is not your application's name, what you get is not the same service as defined for the rest of the application. This is the problem that https://github.com/DataDog/dd-trace-rb/pull/3184 attempts to fix. I think https://github.com/DataDog/dd-trace-rb/pull/3383 actually misses the mark a bit as it ties the service name to the database service which as you point out it not really correct, as opposed to the configured service name. Let me know if this makes sense!

@ozydingo to clarify, when c.service = 'my-service' is set and :rails, service_name is not, then the behavior is correct. But when :rails, service_name: 'custom-service', then behavior isn't correct? You expectinstantiationspans to becustom-service`?

If this is the case, then I think there's a misunderstanding in the documentation. For rails, service_name at present it states:

Service name used when tracing application requests (on the rack level)

This was meaning to communicate that it effectively changes c.service = 'custom-service', not all the underlying instrumentation. I can understand how that can be confusing from how its written, and perhaps in should be rewritten. This seemingly redundant behavior is an artifact of service name configuration from over 6 years ago, before c.service was available.

Our intent now is that all spans within the same Ruby application reflect the same service value, so that our data model (and UX) can accurately group all the activity that occurred within the same Ruby application. Without clean, consistent data, it's very difficult for us to do this, and leads to other undesirable side effects (such as service map fragmentation.)

DD Support staff recommended we use the custom service names

@dannyrife I acknowledge that this has often been recommended in the past. However, doing so will pollute service data, and for this reason we have made a deliberate decision to remove this behavior, so as to not encourage the practice. This is not what service was made for.

I also acknowledge that without customized service names, the UX lacks the capability of sub-grouping spans within an application, and some features that are tied explicitly to service may not function as desired. We hear you loud and clear, and share exactly the same concern. We are working hard right now to change the UX to provide first-class support for alternative fields that allow for equivalent subgrouping/targeting, as a replacement for customized service names, that is free of side effects and even more robust.

This change will take some time to fully realize within the Ruby library. In the interim, if the trade-offs are worthwhile, you may continue to customize service names by modifying the underlying spans, including through the use of our "Processing Pipeline", until an improved UX is in place.

Hopefully this helps clarify things, and I'm happy to take more feedback/questions/suggestions as well.

ozydingo commented 9 months ago

would it be desirable that active_record.instantiation span's service name matches your global c.service?

That would be fine; but stated perhaps at a higher level / user story: I would like to be able to customize the service name for active_record.instantiation to be something other than my rails application name. It may be that I can do this with the global service configuration. I will look at this anew.

But let me demo a simpler example that I deployed to illustrate, since I'd love to clarify if this fits with the designed intent as stated.

Configuration

  Datadog.configure do |c|
    # ...
    c.tracing.instrument :rails, service_name: "foo"
    c.tracing.instrument :active_record, service_name: "foo"
    c.tracing.instrument :mysql2
    # ...
  end

From this, I have the following trace flamegraph and corresponding services

Screen Shot 2024-01-30 at 1 09 32 PM

zoomed in a little more

Screen Shot 2024-01-30 at 1 18 08 PM

What we see is that active_record instrumentation splits, as I believe you say is intended, into service foo for mysql.query queries and threeplay (the configured default service name) for active_record.instantiation.

On the other hand, all operations that are instrumented via rails, including rails.action_controller and rails.render_partial, as well as rack.request, are all respecting the service_name configuration for the rails instrumentation.

On the other hand, graphql instrumentation in the same ruby application gets a separate service name, graphql by default.

So if by "all the underlying instrumentation" you mean rails-unrelated instrumentation, then I understand what you said, but it seems that using :rails, service_name; "foo" does change all of its contained / underlying instrumentations (rack, action_controller, etc). So from what I can see, every other service being instrumented supports configuring the service name for all of its instrumented operations with the lone exception of active_record.instantiation. I can look at doing this via the global configuration, but I must first resurrect the reason why we explicitly set the global service configuration to some other specific name (threeplay as seen in my screenshot).

But do we agree that the behavior of active_record.instantiation differs from the other instrumentables?

p-datadog commented 8 months ago

@ozydingo In your last comment, can you clarify where "threeplay" is configured? I am not seeing it in the configuration you specified.

ozydingo commented 8 months ago

@ozydingo In your last comment, can you clarify where "threeplay" is configured? I am not seeing it in the configuration you specified.

It's set using the DD_SEEVICE env var on the app containers. There might also be something set in our agent (we're using a k8s setup) but I think the pod env takes preference IIRC.

p-datadog commented 8 months ago

Thank you for that @ozydingo , I have done some more research and testing and I think that AR instantation instrumentation is indeed different in that it instruments two different services but doesn't permit specifying two service names.

If we look at https://github.com/DataDog/dd-trace-rb/blob/master/lib/datadog/tracing/contrib/rails/framework.rb, for some Rails components we forward the service name configured for the rails integration, and for some we don't. For example, ActiveSupport is a component that doesn't get the rails service name forwarded to it, and it gets its own service name "active_support-cache": sn1

To override this, one needs to specify cache_service option to active_support instrumentation:

sn2

Back to AR - "ActiveRecord" instrumentation instruments two different services:

Currently, service name specified to ActiveRecord instrumentation is used for the SQL queries, which is probably what most users would want and expect, but there isn't a way to specify the service name for the instantiation instrumentation, and therefore the instantiation uses the global service name (either from Datadog.configure block or from DD_SERVICE environment variable). Unfortunately, both of these operations could be considered to exist under or within ActiveRecord, which makes the situation confusing.

How could this be fixed? My naive idea is to specify, for example, the following options to ActiveRecord instrumentation:

However, given that the SQL query instrumentation is probably the most common use case, and given that it's already affected by the service_name parameter, perhaps it's the in-process Ruby code that needs a prefixed service name option for it, simply out of practical considerations, though this also could be somewhat confusing.

Alternatively, perhaps AR SQL and AR instantiation could be covered by different instrumentations, at which point they could each have an option called service_name.

dannyrife commented 8 months ago

@p-datadog I think not changing the way it is automatically done right now is important (in case people have added monitoring specifically for the existing behavior) So I'd suggest something like: service_name -> affects sql service calls (current behavior) instantiation_service_name -> affects instantiations and if it's not set, defaults to the current behavior (so we don't break things for existing users)

p-datadog commented 8 months ago

@ozydingo @dannyrife Could you also clarify why you wish to provide service name to rails instrumentation instead of specifying it globally, either via DD_SERVICE or

Datadog.configure do |c|
  c.service = '...'

?

dannyrife commented 8 months ago

So we do set the c.service settings but since parts of the DD UI don't allow granularity beyond tags service, env, & version (like muting errors in error tracking) we are prepending all the the instrumented parts with the service name. Ideally we want to use default services and have the various parts of the UI work use all the extra tags that are added. Configuration looks something like this

Datadog.configure do |c|
  c.env        = ENV['DD_ENV']
  c.service    = ENV['DD_SERVICE']
  ...

  Datadog::Tracing::Contrib::REGISTRY.map(&:name).each do |integration|
    c.tracing.instrument integration, service_name: "#{c.service}-#{integration}"
  end
  c.tracing.instrument :active_support, cache_service: "#{c.service}-cache"
p-datadog commented 8 months ago

Thank you very much, I think this is great product feedback:

Ideally we want to use default services and have the various parts of the UI work use all the extra tags that are added.

ozydingo commented 8 months ago

@p-datadog

@ozydingo @dannyrife Could you also clarify why you wish to provide service name to rails instrumentation instead of specifying it globally,

Actually I think in our case that would work now. Originally we left the global config as a separate name as we were empirically discovering how service names were generated, as a catch for other misc service names (probably because we were confused by the extra service name from AR, as well as discrepancy between AR's MySQL instrumentation vs the direct mysql2 instrumentation, both of which we have active!)

Now that we're stable and have our datadog workflow ironed out, I think using a single globally configured app service name actually does make sense, as long as we can get sensible information between AR, AR's mysql, and mysql2 instrumentation 😄

MickeyMcc commented 7 months ago

Came across this thread after my team upgraded from ddtrace v0.53 to v1.20 and was surprised by the active_record.instantiation spans being lumped into our top level rails service.

Back to AR - "ActiveRecord" instrumentation instruments two different services: SQL queries are a proxy for the database instantiation is entirely within the running Ruby application

From this old issue as well as mentioned the upgrade guide, there used to be a option called :orm_service_name on the active_record config block which I think might have covered the distinction between the above mentioned two services within this instrumentation. So if an interface for a second name was made, that might be a reasonable thing to call it.

I'm always curious when a feature like that is removed between major versions what the driving reasoning is especially when there end up being these kinds of feature/bug requests to create similar functionality. The change wasn't a dealbreaker on the upgrade for us, but it did seem like a weird thing to have lost the ability to do.