elastic / apm

Elastic Application Performance Monitoring - resources and general issue tracking for Elastic APM.
https://www.elastic.co/apm
Apache License 2.0
384 stars 114 forks source link

Current span naming status as a step towards name alignment #446

Open eyalkoren opened 3 years ago

eyalkoren commented 3 years ago

Aligning span names

This is the initial step towards aligning span names across agents. Naturally, it only deals with frameworks that are not language-specific. We have specs for those, but in some cases, these specs were added in retrospect, reflecting the naming-chaos at the time, so we may want to revisit.

@elastic/apm-agent-devs Please take the time to fill some examples in the tables and vote wherever there is a poll-topic-xxl sign - either add πŸ‘ to an existing option, or add a new one with your πŸ‘ on it. After I gather your feedback, I will make the spec PRs accordingly wherever required.

OpenTelemetry general Span guidelines for reference.

HTTP

The naming scheme for transactions is not rigid in the existing spec. It is more so for spans, where names should have the format <method> <host>.

OpenTelemetry HTTP spec for reference.

Existing state: Agent Transaction Span
Java GET unknown route
ServletClassName#doGet
GET /testWithPathMethod/{id}
GET 127.0.0.1
GET [::1]
GET localhost
.NET
Go GET unknown route
GET /hello/:name
GET /hello/{name}
GET host:port
Node.js GET unknown route (vanilla HTTP server)
GET /hello/:name (typical with frameworks)
static file (if using express.static())
CORS preflight (CORS preflight req with hapi framework)
POST localhost:9200/myIndex*/_search (full URL usage is https://github.com/elastic/apm-agent-nodejs/issues/2067)
GET http://localhost:52394/sub (http2, same issue)
PHP GET /users/{id} POST example.com
Python GET /users/{id}
(only for frameworks that expose the necessary routing info )
POST example.com
Ruby PostsController#index (most likely)
GET /users/:id
POST /publish
Rack (last resort)
POST example.com
GET elastic.co

poll-topic-xxl Choose one of the followings:

DB

The spec defines a convention for each DB type.

OpenTelemetry DB spec for reference.

Existing state: Agent SQL DB Redis MongoDB GraphQL Elasticsearch DynamoDB S3
Java SELECT FROM table SET db.collection.drop NA Elasticsearch: GET /index/_search NA NA
.NET
Go SELECT FROM table SET
(flush pipeline)
${collection}.${command}
aggregate
NA Elasticsearch: GET /index/_search DynamoDB GetItem Movies S3 PutObject bucket
Node.js SELECT FROM table_name
UPDATE table_name
SET db.collection.command e.g. elasticapm.test.insert, system.$cmd.ismaster Transaction name: [opName] queryNames, ... (/path)
Span name: GraphQL: [opName] queryNames, ...
(see note below)
Elasticsearch: ${method} ${path} e.g. Elasticsearch: POST /myIndex*/_search Not yet implemented Not yet implemented
PHP SELECT FROM table Not supported yet Not supported yet Not supported yet Not supported yet Not supported yet Not supported yet
Python SELECT FROM table SET db.collection.drop NA ES GET /myindex/_search DynamoDB Query tableName S3 PutObject bucket
Ruby SELECT FROM table SET db.collection.drop NA (outgoing) GET _search DynamoDB Query tableName S3 CreateBucket bucket-name

The spec includes prefixed names like:

poll-topic-xxl Choose one of the followings:

Messaging

The spec defines naming convention that contains lots of prefixing. Kafka is not represented in the spec and in Java it is really bad (100% my fault πŸ€¦β€β™‚οΈ ) - a combination of API-based for send spans and prefix-based for receive spans. This is because we don't really have an API in the Kafka client for a retrieval of one record, there is only API for reading a batch, which is not a good fit for transaction name that is based on a single record.

OpenTelemetry messaging spec for reference.

Existing state: Agent Kafka RabbitMQ SQS SNS Other
Java Send spans:
KafkaProducer#send to MyTopic
Transactions:
Kafka record from MyTopic
RabbitMQ RECEIVE from MyQueue NA NA JMS SEND to MyQueue
.NET
Go NA NA SQS SEND to queue SNS PUBLISH myTopic NA
Node.js Not supported Not supported SQS SEND to queue_name
SQS RECEIVE from queue_name
SQS DELETE from queue_name
Not supported NA
PHP Not supported yet Not supported yet Not supported yet Not supported yet Not supported yet
Python
Ruby NA NA SQS SEND to my-queue SNS PUBLISH to MyTopic NA

poll-topic-xxl Choose one of the followings:

* Note: it is common in messaging clients that the receive API is designed for batches, in which case the API cannot be used as is for the transaction name. So if you choose this option, propose a name for such transactions.

gRPC

The spec defines to used the method as the name, eg: /helloworld.Greeter/SayHello. Since this is a fairly new spec that was approved lately, I am assuming that everybody is happy with it.

trentm commented 3 years ago

@eyalkoren Should GraphQL be put in a separate section? While the Node.js graphql instrumentation does use type=db https://github.com/elastic/apm/blob/master/specs/agents/tracing-instrumentation-graphql.md is separate and quite different from https://github.com/elastic/apm/blob/master/specs/agents/tracing-instrumentation-db.md

Notes on Node.js APM agent's graphql transaction and span naming:

trentm commented 3 years ago

Some thoughts on the current votes for DB span names. Current voting is:

DB

Lose prefixes altogether πŸ‘ πŸ‘ πŸ‘

I think having a string with the name of the technology is super useful to be (a) self-explanatory when looking at trace data (whether in the waterfall, or Discover, or raw data) and (b) for text searching rendered data (for example doing search-in-page on a waterfall rendering). If we lost prefixes altogether then:

    UpdateItem frobnitz
    GetObject schnizel
    GET /ladida/_search

can you easily tell those are DynamoDB, S3, and Elasticsearch, respectively? While not specified at all, it is somewhat notable that otel suggests a fallback HTTP span name of HTTP $METHOD, i.e. with an "HTTP " prefix.

The subtype field (always?) has this info (albeit lowercased). Does anyone know if there has been discussion with getting the waterfall in APM UI to show the subtype value?

eyalkoren commented 3 years ago

Should GraphQL be put in a separate section?

It can definitely go into a different section, especially since it has its own spec. The table only shows examples and I felt that the poll would essentially be the same for GraphQL - should we or shouldn't we use prefixes, with or without colon. If you think we would benefit from having a different section with different decisions for it, please add it (there is nothing I can contribute to such section).

I think having a string with the name of the technology is super useful to be (a) self-explanatory when looking at trace data (whether in the waterfall, or Discover, or raw data) and (b) for text searching rendered data (for example doing search-in-page on a waterfall rendering).

I can't agree more! This is the reason the prefixes are there in the first place. The preliminary discussion we had suggested that the removal of prefixes will be coupled with adding icons and/or subtypes to wherever makes sense. This is exactly why we have this discussion, and it is perfectly fine if we end it with the decision that we keep prefixes, but then at least let's align - colon/no-colon, letter case etc. If we do decide to remove them, our next step would be to decide if and how we couple this with a proper replacement. Very valuable feedback, thanks!

trentm commented 3 years ago

The preliminary discussion we had suggested that the removal of prefixes will be coupled with adding icons and/or subtypes to wherever makes sense.

Okay, cool. I changed my vote to "Keep prefix where already exist, lose colon in ES"... unless it is coupled (as much as is possible with the separate release schedules for Kibana and the agents) with the subtype being rendered in APM UI. Other thoughts:

GraphQL

Yah, I think a different section might help highlight that both the spec and Node.js APM's current behaviour have different naming for GraphQL transactions vs spans. However, unless Python APM supports GraphQL, it looks like Node.js is the only one? (I'm not sure what Ruby APM means by NA (outgoing).)

eyalkoren commented 3 years ago

I changed my vote to "Keep prefix where already exist, lose colon in ES"

Great, please just clarify - do you think we should remove colon from ES spans and keep it for GraphQL spans? This question still fits in the DB poll/section, but if you do end up separating GraphQL to another section, please mention this option (keep prefix, no colon). If you don't separate them, please clarify your comment regarding how we should treat GraphQL spans. Reverting to - "lose colon on whatever span they exist" is good enough too πŸ™‚

Icons are of limited use, IMO: you can't search-in-page for them.

I'd argue that since the waterfall is a graphical view of method-calls, icons have higher value than text. I'd further argue that if you need to do a Ctrl+F on a waterfall view, it's because it is not good enough and we should do better. The initiatives to aggregate similar spans and the ability to collapse subtrees in this view address this problem (and disable such HTML searches). So, while we should consider this usage, if I had to choose only one, I would personally go for icon. I am not talking about Discover, where we have the subtype field for such searches.

trentm commented 3 years ago

If you don't separate them, please clarify ... Reverting to - "lose colon on whatever span they exist" is good enough too

Done. :) I voted for a newly added Keep prefix, lose colon in all cases:

I'd argue that since the waterfall is a graphical view of method-calls, icons have higher value than text. ...

Okay, fair argument. I'll keep my vote on the dissenting side, but if icons are done well (and have a hovertip/tooltip that provides the subtype text for self-documentation), then I'm fine losing this vote.

eyalkoren commented 3 years ago

Done. :) I voted for a newly added Keep prefix, lose colon in all cases:

Thanks, no need for both, I thought the original was not mentioning ES specifically. Removed the unnecessary one.

if icons are done well (and have a hovertip/tooltip that provides the subtype text for self-documentation), then I'm fine losing this vote

If you already hover over the span in the waterfall view, and you cannot recognise the icon, clicking it will discover the subtype in the span details.

eyalkoren commented 3 years ago

@formgeist your input can be very useful at this point.

TLDR: we initiated this cross-agent alignment in span naming and one of the things that came up is the idea of losing technology prefixes from span names, where they exist, for example:

The down side is losing the ability to easily identify what technology the span is related to, as well as the ability to do a text-search on large waterfall views.

Therefore, we thought of coupling this change with a proper UI replacement, like icons, and/or anything else that makes sense. This will of course won't guarantee waterfall views without both when old agent is used with new Kibana, but at least we know we deliver them together.

WDYT?

formgeist commented 3 years ago

@eyalkoren I think cross-aligning is good and the representation with icons (like we do with service maps and service inventory) makes sense if done subtly. I’m not sure how open the UI team is to do a lot of work in the timeline component today, as we’re planning to move to the shared Synthetics waterfall chart soon-ish, but I can open a design issue to take a stab at doing a mock of what this will look like and open a enhancement issue in Kibana, then we can propose it to the UI team in the near future. When do you expect this alignment to be in place?

formgeist commented 3 years ago

I've opened https://github.com/elastic/observability-design/issues/66 to track the design proposal πŸ‘

eyalkoren commented 3 years ago

When do you expect this alignment to be in place?

If we decide to block the span naming change until this is addressed, it will have to depend on UI prioritisation. I don't think there is a pressing need to do this change promptly.

formgeist commented 3 years ago

@eyalkoren I think in its simplest form we're just replacing the DB icon with the icons we're using in the Service map. For the other types, we're not currently indicating by icon, we'd need to add that. We have a shared function for determining the icon by type, so I think most of the work is already available - it's "just" changing the icon from the DB general icon (which we'd keep as a fallback) and showing the framework icon instead. I've opened a Kibana issue and discuss it with the UI team.

Span type visuals - ES

eyalkoren commented 3 years ago

it's "just" changing the icon from the DB general icon (which we'd keep as a fallback) and showing the framework icon instead.

It means changing the algorithm to first look at type and possibly then on subtype. For example:

if type == http
  use httpIcon
else if type == db
  if subtype == mysql
    use mysqlIcon
  else if subtype == elasticsearch
    use elasticsearchIcon
  ...
else if type == messaging
  if subtype == kafka
    use kafkaIcon
  else if subtype == rabbitMq
    use rabbitMqIcon
  ...
...

(I made it long on purpose, to make the point that subtypes are important for messaging spans as well πŸ™‚ )

trentm commented 3 years ago

I think in its simplest form we're just replacing the DB icon with the icons we're using in the Service map.

@formgeist It is a minor thing, but does/could that icon include a hover tip with the type/subtype or some differentiating string?

formgeist commented 3 years ago

@trentm That's a good suggestion - it could look something like this when hovering the icon.

Span type visuals - ES - Tooltip on hover

felixbarny commented 3 years ago

If there's no icon for a given type/subtype, I think there should be a fallback to show either only the subtype or type.subtype in a badge.

formgeist commented 3 years ago

If there's no icon for a given type/subtype, I think there should be a fallback to show either only the subtype or type.subtype in a badge.

Yeah, makes sense πŸ‘