elastic / apm

Elastic Application Performance Monitoring - resources and general issue tracking for Elastic APM.
https://www.elastic.co/apm
Apache License 2.0
383 stars 114 forks source link

Proposal: add span.id to transactions, rename parent.id #413

Open axw opened 3 years ago

axw commented 3 years ago

https://github.com/elastic/ecs/issues/998 highlights some issues with the fields we use in APM. In particular:

For Elastic APM agents, spans always have an associated transaction.id. This is not the case when translating OpenTelemetry data to Elastic APM, as OpenTelemetry does not have a concept of "entry-point spans". As a result, not all spans have an associated transaction.id, and if OpenTelemetry data is to be a first-class citizen in Elastic APM, the UI cannot make any assumptions about transaction.id being present in spans.

We already consider and treat transactions as a special sort of span (entry-point spans). Transactions and spans share a significant amount of overlap: trace IDs, timestamp, duration, outcome. I propose a minor change to our data model, moving towards unifying transactions and spans:

Thus in terms of IDs, transactions are effectively spans. There are still other important differences that we would later need to change if we want to completely unify transactions and spans, such as transaction.type, transaction.name, etc.

Question: what remaining benefits are there to including transaction.id in span documents, or in transaction documents for that matter?

simitt commented 3 years ago

add a span.id to transaction documents. As an implementation detail, this could just be an alias to transaction.id

So span.id would always refer up never down? One transaction can have one parent span, but can be the root span for multiple spans.

rename parent.id to span.parent_id; this now always refers to a span.id, never transaction.id.

It is not clear to me why span.parent_id would always refers to a span.id? If we were using parent.span.id then that would be clearly indicated. (not saying we should use that one, just trying to clarify).

transaction.id is used only for identifying spans that are part of the same local sub-tree. This field becomes optional.

Is there no need for retrieving the same local sub-tree or are you saying that could be done differently with the proposal?

Have you thought about what should happen to the parent_id in error documents?

axw commented 3 years ago

add a span.id to transaction documents. As an implementation detail, this could just be an alias to transaction.id

So span.id would always refer up never down? One transaction can have one parent span, but can be the root span for multiple spans.

I don't know what you mean by refer up/down. The core of the proposal above is essentially to copy transaction.id to span.id in transaction docs. That way parent.id (or whatever we call it) is always referring to a span.id.

rename parent.id to span.parent_id; this now always refers to a span.id, never transaction.id.

It is not clear to me why span.parent_id would always refers to a span.id? If we were using parent.span.id then that would be clearly indicated. (not saying we should use that one, just trying to clarify).

Fair point. The reason I said span.parent_id is to have a common prefix for these closely related fields, but I agree it's still not clear from the name that the parent is a span.

transaction.id is used only for identifying spans that are part of the same local sub-tree. This field becomes optional.

Is there no need for retrieving the same local sub-tree or are you saying that could be done differently with the proposal?

I'm not sure if or where it's needed -- see my question in the description. In the UI we filter the tree client-side AFAIK.

Have you thought about what should happen to the parent_id in error documents?

Good point. If we do rename parent.id, then it should also apply to errors.

simitt commented 3 years ago

I don't know what you mean by refer up/down. The core of the proposal above is essentially to copy transaction.id to span.id in transaction docs. That way parent.id (or whatever we call it) is always referring to a span.id.

Thanks for clarifying. I completely mis-interpreted that. The transaction.id of a span document means the span belongs_to this transaction. So I assumed setting a span.id on a transaction would also mean it belongs to the referenced span, rather than it represents this span.

felixbarny commented 3 years ago

Question: what remaining benefits are there to including transaction.id in span documents, or in transaction documents for that matter?

For log correlation, it can be useful to just inject the trace.id and transaction.id into the logs. That helps to avoid the allocation of a string for the span.id for every span in a transaction. That can make a significant difference if there are hundreds of spans in every transaction.

But I guess we could remove transaction.id from spans and add a span.id to transactions as long as a transaction will still have a transaction.id.

axw commented 3 years ago

For log correlation, it can be useful to just inject the trace.id and transaction.id into the logs. That helps to avoid the allocation of a string for the span.id for every span in a transaction. That can make a significant difference if there are hundreds of spans in every transaction.

IIANM, you could switch the Java agent's log correlation code to inject span.id for just the transaction's ID. i.e. just change this constant:

https://github.com/elastic/apm-agent-java/blob/453004a190fa27dda983b65cd56fc01a79123b2b/apm-agent-plugins/apm-log-correlation-plugin/src/main/java/co/elastic/apm/agent/mdc/MdcActivationListener.java#L50

But I don't think we necessarily need to take transaction.id away -- as long we're clear that it's optional on spans.

For Elastic agents, we can continue to add transaction.id to both transactions and spans. By injecting transaction.id into logs we enable efficiently correlating with all spans in the associated transaction.

For log correlation in OpenTelemetry, the best we could do is correlate logs with a specific span. If we wanted to show all associated local sub-tree spans we would need to query the trace and reconstruct the graph, like in the APM app.

I think this is a good reason to keep transaction.id around.

felixbarny commented 3 years ago

IIANM, you could switch the Java agent's log correlation code to inject span.id for just the transaction's ID. i.e. just change this constant:

It might be confusing to correlate a log with a span.id that's not the actual span the log belongs to. Example: consider a Transaction t with a Span s. During the executing of s, the application emits a log. If we inject a span.id into the log, it should be s.id, not t.id.

But I don't think we necessarily need to take transaction.id away -- as long we're clear that it's optional on spans.

👍 We probably need to update the log correlation feature in the UI then. Unless it currently just uses the trace.id.