eyalkoren commented 3 years ago

Aligning span names

This is the initial step towards aligning span names across agents. Naturally, it only deals with frameworks that are not language-specific. We have specs for those, but in some cases, these specs were added in retrospect, reflecting the naming-chaos at the time, so we may want to revisit.

@elastic/apm-agent-devs Please take the time to fill some examples in the tables and vote wherever there is a poll-topic-xxl sign - either add 👍 to an existing option, or add a new one with your 👍 on it. After I gather your feedback, I will make the spec PRs accordingly wherever required.

OpenTelemetry general Span guidelines for reference.

HTTP

The naming scheme for transactions is not rigid in the existing spec. It is more so for spans, where names should have the format <method> <host>.

OpenTelemetry HTTP spec for reference.

Existing state:	Agent	Transaction
Java	`GET unknown route` `ServletClassName#doGet` `GET /testWithPathMethod/{id}`	`GET 127.0.0.1` `GET [::1]` `GET localhost`
.NET
Go	`GET unknown route` `GET /hello/:name` `GET /hello/{name}`	`GET host:port`
Node.js	`GET unknown route` (vanilla HTTP server) `GET /hello/:name` (typical with frameworks) `static file` (if using express.static()) `CORS preflight` (CORS preflight req with hapi framework)	`POST localhost:9200/myIndex*/_search` (full URL usage is https://github.com/elastic/apm-agent-nodejs/issues/2067) `GET http://localhost:52394/sub` (http2, same issue)
PHP	`GET /users/{id}`	`POST example.com`
Python	`GET /users/{id}` (only for frameworks that expose the necessary routing info )	`POST example.com`
Ruby	`PostsController#index` (most likely) `GET /users/:id` `POST /publish` `Rack` (last resort)	`POST example.com` `GET elastic.co`

poll-topic-xxl Choose one of the followings:

Spec is good enough as is 👍 👍 👍 👍 👍
Spec should change (replace with your suggestion and add a +1)

DB

The spec defines a convention for each DB type.

OpenTelemetry DB spec for reference.

Existing state:	Agent	SQL DB	Redis	MongoDB	GraphQL	Elasticsearch	DynamoDB
Java	`SELECT FROM table`	`SET`	`db.collection.drop`	NA	`Elasticsearch: GET /index/_search`	NA	NA
.NET
Go	`SELECT FROM table`	`SET` `(flush pipeline)`	`${collection}.${command}` `aggregate`	NA	`Elasticsearch: GET /index/_search`	`DynamoDB GetItem Movies`	`S3 PutObject bucket`
Node.js	`SELECT FROM table_name` `UPDATE table_name`	`SET`	`db.collection.command` e.g. `elasticapm.test.insert`, `system.$cmd.ismaster`	Transaction name: `[opName] queryNames, ... (/path)` Span name: `GraphQL: [opName] queryNames, ...` (see note below)	`Elasticsearch: ${method} ${path}` e.g. `Elasticsearch: POST /myIndex*/_search`	Not yet implemented	Not yet implemented
PHP	`SELECT FROM table`	Not supported yet	Not supported yet	Not supported yet	Not supported yet	Not supported yet	Not supported yet
Python	`SELECT FROM table`	`SET`	`db.collection.drop`	NA	`ES GET /myindex/_search`	`DynamoDB Query tableName`	`S3 PutObject bucket`
Ruby	`SELECT FROM table`	`SET`	`db.collection.drop`	NA (outgoing)	`GET _search`	`DynamoDB Query tableName`	`S3 CreateBucket bucket-name`

The spec includes prefixed names like:

DynamoDB UpdateItem my_table
S3 GetObject my-bucket
Elasticsearch: GET /index/_search

poll-topic-xxl Choose one of the followings:

Lose prefixes altogether: 👍 👍 👍
Keep prefix, lose colon in all cases: (👍 or render subtype in waterfall)
Keep prefix with colon:

Messaging

The spec defines naming convention that contains lots of prefixing. Kafka is not represented in the spec and in Java it is really bad (100% my fault 🤦‍♂️ ) - a combination of API-based for send spans and prefix-based for receive spans. This is because we don't really have an API in the Kafka client for a retrieval of one record, there is only API for reading a batch, which is not a good fit for transaction name that is based on a single record.

OpenTelemetry messaging spec for reference.

Existing state:	Agent	Kafka	RabbitMQ	SQS	SNS
Java	Send spans: `KafkaProducer#send to MyTopic` Transactions: `Kafka record from MyTopic`	`RabbitMQ RECEIVE from MyQueue`	NA	NA	`JMS SEND to MyQueue`
.NET
Go	NA	NA	`SQS SEND to queue`	`SNS PUBLISH myTopic`	NA
Node.js	Not supported	Not supported	`SQS SEND to queue_name` `SQS RECEIVE from queue_name` `SQS DELETE from queue_name`	Not supported	NA
PHP	Not supported yet	Not supported yet	Not supported yet	Not supported yet	Not supported yet
Python
Ruby	NA	NA	`SQS SEND to my-queue`	`SNS PUBLISH to MyTopic`	NA

poll-topic-xxl Choose one of the followings:

With prefix, e.g. Kafka send to MyTopic
General form, human-readable, e.g. Record send to MyTopic or Message received from MyQueue
General form, OTel scheme: <destination_name> <operation_name>, e.g. MyTopic send or MyQueue receive 👍 👍
API-based names, e.g. MeassgeProducer#send *
Other (replace with your suggestion and add a +1)

* Note: it is common in messaging clients that the receive API is designed for batches, in which case the API cannot be used as is for the transaction name. So if you choose this option, propose a name for such transactions.

gRPC

The spec defines to used the method as the name, eg: /helloworld.Greeter/SayHello. Since this is a fairly new spec that was approved lately, I am assuming that everybody is happy with it.

trentm commented 3 years ago

@eyalkoren Should GraphQL be put in a separate section? While the Node.js graphql instrumentation does use type=db https://github.com/elastic/apm/blob/master/specs/agents/tracing-instrumentation-graphql.md is separate and quite different from https://github.com/elastic/apm/blob/master/specs/agents/tracing-instrumentation-db.md

Notes on Node.js APM agent's graphql transaction and span naming:

Span name examples: GraphQL: Unknown Query GraphQL: queryName GraphQL: queryName1, queryName2, ... GraphQL: opName1 queryName1, queryName2, ...
Transaction:
- trans.type is set to graphql
- trans.name is set to ${operationName?} ${queryNames?.join(', ') || 'Unknown GraphQL query'} ($routeName?)}. Notably this is quite different than the spec. Examples: hello (/graphql), HelloQuery hello (/), hello, life (/)

trentm commented 3 years ago

Some thoughts on the current votes for DB span names. Current voting is:

DB

Lose prefixes altogether 👍 👍 👍

I think having a string with the name of the technology is super useful to be (a) self-explanatory when looking at trace data (whether in the waterfall, or Discover, or raw data) and (b) for text searching rendered data (for example doing search-in-page on a waterfall rendering). If we lost prefixes altogether then:

    UpdateItem frobnitz
    GetObject schnizel
    GET /ladida/_search

can you easily tell those are DynamoDB, S3, and Elasticsearch, respectively? While not specified at all, it is somewhat notable that otel suggests a fallback HTTP span name of HTTP $METHOD, i.e. with an "HTTP " prefix.

The subtype field (always?) has this info (albeit lowercased). Does anyone know if there has been discussion with getting the waterfall in APM UI to show the subtype value?

eyalkoren commented 3 years ago

Should GraphQL be put in a separate section?

It can definitely go into a different section, especially since it has its own spec. The table only shows examples and I felt that the poll would essentially be the same for GraphQL - should we or shouldn't we use prefixes, with or without colon. If you think we would benefit from having a different section with different decisions for it, please add it (there is nothing I can contribute to such section).

I think having a string with the name of the technology is super useful to be (a) self-explanatory when looking at trace data (whether in the waterfall, or Discover, or raw data) and (b) for text searching rendered data (for example doing search-in-page on a waterfall rendering).

I can't agree more! This is the reason the prefixes are there in the first place. The preliminary discussion we had suggested that the removal of prefixes will be coupled with adding icons and/or subtypes to wherever makes sense. This is exactly why we have this discussion, and it is perfectly fine if we end it with the decision that we keep prefixes, but then at least let's align - colon/no-colon, letter case etc. If we do decide to remove them, our next step would be to decide if and how we couple this with a proper replacement. Very valuable feedback, thanks!

trentm commented 3 years ago

The preliminary discussion we had suggested that the removal of prefixes will be coupled with adding icons and/or subtypes to wherever makes sense.

Okay, cool. I changed my vote to "Keep prefix where already exist, lose colon in ES"... unless it is coupled (as much as is possible with the separate release schedules for Kibana and the agents) with the subtype being rendered in APM UI. Other thoughts:

If the votes for "lose the prefix" are related to "Elasticsearch" being long, then we could consider "ES" instead.
"icons and/or subtypes" Icons are of limited use, IMO: you can't search-in-page for them.

GraphQL

Yah, I think a different section might help highlight that both the spec and Node.js APM's current behaviour have different naming for GraphQL transactions vs spans. However, unless Python APM supports GraphQL, it looks like Node.js is the only one? (I'm not sure what Ruby APM means by NA (outgoing).)

eyalkoren commented 3 years ago

I changed my vote to "Keep prefix where already exist, lose colon in ES"

Great, please just clarify - do you think we should remove colon from ES spans and keep it for GraphQL spans? This question still fits in the DB poll/section, but if you do end up separating GraphQL to another section, please mention this option (keep prefix, no colon). If you don't separate them, please clarify your comment regarding how we should treat GraphQL spans. Reverting to - "lose colon on whatever span they exist" is good enough too 🙂

Icons are of limited use, IMO: you can't search-in-page for them.

I'd argue that since the waterfall is a graphical view of method-calls, icons have higher value than text. I'd further argue that if you need to do a Ctrl+F on a waterfall view, it's because it is not good enough and we should do better. The initiatives to aggregate similar spans and the ability to collapse subtrees in this view address this problem (and disable such HTML searches). So, while we should consider this usage, if I had to choose only one, I would personally go for icon. I am not talking about Discover, where we have the subtype field for such searches.

trentm commented 3 years ago

If you don't separate them, please clarify ... Reverting to - "lose colon on whatever span they exist" is good enough too

Done. :) I voted for a newly added Keep prefix, lose colon in all cases:

I'd argue that since the waterfall is a graphical view of method-calls, icons have higher value than text. ...

Okay, fair argument. I'll keep my vote on the dissenting side, but if icons are done well (and have a hovertip/tooltip that provides the subtype text for self-documentation), then I'm fine losing this vote.

eyalkoren commented 3 years ago

Done. :) I voted for a newly added Keep prefix, lose colon in all cases:

Thanks, no need for both, I thought the original was not mentioning ES specifically. Removed the unnecessary one.

if icons are done well (and have a hovertip/tooltip that provides the subtype text for self-documentation), then I'm fine losing this vote

If you already hover over the span in the waterfall view, and you cannot recognise the icon, clicking it will discover the subtype in the span details.

eyalkoren commented 3 years ago

@formgeist your input can be very useful at this point.

TLDR: we initiated this cross-agent alignment in span naming and one of the things that came up is the idea of losing technology prefixes from span names, where they exist, for example:

RabbitMQ RECEIVE from MyQueue
Elasticsearch: GET /index/_search
DynamoDB Query tableName
GraphQL: queryName

The down side is losing the ability to easily identify what technology the span is related to, as well as the ability to do a text-search on large waterfall views.

Therefore, we thought of coupling this change with a proper UI replacement, like icons, and/or anything else that makes sense. This will of course won't guarantee waterfall views without both when old agent is used with new Kibana, but at least we know we deliver them together.

WDYT?

formgeist commented 3 years ago

@eyalkoren I think cross-aligning is good and the representation with icons (like we do with service maps and service inventory) makes sense if done subtly. I’m not sure how open the UI team is to do a lot of work in the timeline component today, as we’re planning to move to the shared Synthetics waterfall chart soon-ish, but I can open a design issue to take a stab at doing a mock of what this will look like and open a enhancement issue in Kibana, then we can propose it to the UI team in the near future. When do you expect this alignment to be in place?

formgeist commented 3 years ago

I've opened https://github.com/elastic/observability-design/issues/66 to track the design proposal 👍

eyalkoren commented 3 years ago

When do you expect this alignment to be in place?

If we decide to block the span naming change until this is addressed, it will have to depend on UI prioritisation. I don't think there is a pressing need to do this change promptly.

formgeist commented 3 years ago

@eyalkoren I think in its simplest form we're just replacing the DB icon with the icons we're using in the Service map. For the other types, we're not currently indicating by icon, we'd need to add that. We have a shared function for determining the icon by type, so I think most of the work is already available - it's "just" changing the icon from the DB general icon (which we'd keep as a fallback) and showing the framework icon instead. I've opened a Kibana issue and discuss it with the UI team.

Span type visuals - ES

eyalkoren commented 3 years ago

it's "just" changing the icon from the DB general icon (which we'd keep as a fallback) and showing the framework icon instead.

It means changing the algorithm to first look at type and possibly then on subtype. For example:

if type == http
  use httpIcon
else if type == db
  if subtype == mysql
    use mysqlIcon
  else if subtype == elasticsearch
    use elasticsearchIcon
  ...
else if type == messaging
  if subtype == kafka
    use kafkaIcon
  else if subtype == rabbitMq
    use rabbitMqIcon
  ...
...

(I made it long on purpose, to make the point that subtypes are important for messaging spans as well 🙂 )

trentm commented 3 years ago

I think in its simplest form we're just replacing the DB icon with the icons we're using in the Service map.

@formgeist It is a minor thing, but does/could that icon include a hover tip with the type/subtype or some differentiating string?

formgeist commented 3 years ago

@trentm That's a good suggestion - it could look something like this when hovering the icon.

Span type visuals - ES - Tooltip on hover

felixbarny commented 3 years ago

If there's no icon for a given type/subtype, I think there should be a fallback to show either only the subtype or type.subtype in a badge.

formgeist commented 3 years ago

If there's no icon for a given type/subtype, I think there should be a fallback to show either only the subtype or type.subtype in a badge.

Yeah, makes sense 👍

elastic / apm

Current span naming status as a step towards name alignment #446

Aligning span names

HTTP

DB

Messaging

gRPC

DB