Open eyalkoren opened 3 years ago
@eyalkoren Should GraphQL be put in a separate section? While the Node.js graphql instrumentation does use type=db
https://github.com/elastic/apm/blob/master/specs/agents/tracing-instrumentation-graphql.md is separate and quite different from https://github.com/elastic/apm/blob/master/specs/agents/tracing-instrumentation-db.md
Notes on Node.js APM agent's graphql transaction and span naming:
GraphQL: Unknown Query
GraphQL: queryName
GraphQL: queryName1, queryName2, ...
GraphQL: opName1 queryName1, queryName2, ...
trans.type
is set to graphql
trans.name
is set to ${operationName?} ${queryNames?.join(', ') || 'Unknown GraphQL query'} ($routeName?)}
.
Notably this is quite different than the spec.
Examples: hello (/graphql)
, HelloQuery hello (/)
, hello, life (/)
Some thoughts on the current votes for DB span names. Current voting is:
DB
Lose prefixes altogether π π π
I think having a string with the name of the technology is super useful to be (a) self-explanatory when looking at trace data (whether in the waterfall, or Discover, or raw data) and (b) for text searching rendered data (for example doing search-in-page on a waterfall rendering). If we lost prefixes altogether then:
UpdateItem frobnitz
GetObject schnizel
GET /ladida/_search
can you easily tell those are DynamoDB, S3, and Elasticsearch, respectively? While not specified at all, it is somewhat notable that otel suggests a fallback HTTP span name of HTTP $METHOD
, i.e. with an "HTTP " prefix.
The subtype
field (always?) has this info (albeit lowercased). Does anyone know if there has been discussion with getting the waterfall in APM UI to show the subtype
value?
Should GraphQL be put in a separate section?
It can definitely go into a different section, especially since it has its own spec. The table only shows examples and I felt that the poll would essentially be the same for GraphQL - should we or shouldn't we use prefixes, with or without colon. If you think we would benefit from having a different section with different decisions for it, please add it (there is nothing I can contribute to such section).
I think having a string with the name of the technology is super useful to be (a) self-explanatory when looking at trace data (whether in the waterfall, or Discover, or raw data) and (b) for text searching rendered data (for example doing search-in-page on a waterfall rendering).
I can't agree more! This is the reason the prefixes are there in the first place. The preliminary discussion we had suggested that the removal of prefixes will be coupled with adding icons and/or subtypes to wherever makes sense. This is exactly why we have this discussion, and it is perfectly fine if we end it with the decision that we keep prefixes, but then at least let's align - colon/no-colon, letter case etc. If we do decide to remove them, our next step would be to decide if and how we couple this with a proper replacement. Very valuable feedback, thanks!
The preliminary discussion we had suggested that the removal of prefixes will be coupled with adding icons and/or subtypes to wherever makes sense.
Okay, cool. I changed my vote to "Keep prefix where already exist, lose colon in ES"... unless it is coupled (as much as is possible with the separate release schedules for Kibana and the agents) with the subtype being rendered in APM UI. Other thoughts:
GraphQL
Yah, I think a different section might help highlight that both the spec and Node.js APM's current behaviour have different naming for GraphQL transactions vs spans. However, unless Python APM supports GraphQL, it looks like Node.js is the only one? (I'm not sure what Ruby APM means by NA (outgoing)
.)
I changed my vote to "Keep prefix where already exist, lose colon in ES"
Great, please just clarify - do you think we should remove colon from ES spans and keep it for GraphQL spans? This question still fits in the DB poll/section, but if you do end up separating GraphQL to another section, please mention this option (keep prefix, no colon). If you don't separate them, please clarify your comment regarding how we should treat GraphQL spans. Reverting to - "lose colon on whatever span they exist" is good enough too π
Icons are of limited use, IMO: you can't search-in-page for them.
I'd argue that since the waterfall is a graphical view of method-calls, icons have higher value than text.
I'd further argue that if you need to do a Ctrl+F
on a waterfall view, it's because it is not good enough and we should do better. The initiatives to aggregate similar spans and the ability to collapse subtrees in this view address this problem (and disable such HTML searches). So, while we should consider this usage, if I had to choose only one, I would personally go for icon.
I am not talking about Discover, where we have the subtype
field for such searches.
If you don't separate them, please clarify ... Reverting to - "lose colon on whatever span they exist" is good enough too
Done. :) I voted for a newly added Keep prefix, lose colon in all cases:
I'd argue that since the waterfall is a graphical view of method-calls, icons have higher value than text. ...
Okay, fair argument. I'll keep my vote on the dissenting side, but if icons are done well (and have a hovertip/tooltip that provides the subtype text for self-documentation), then I'm fine losing this vote.
Done. :) I voted for a newly added Keep prefix, lose colon in all cases:
Thanks, no need for both, I thought the original was not mentioning ES specifically. Removed the unnecessary one.
if icons are done well (and have a hovertip/tooltip that provides the subtype text for self-documentation), then I'm fine losing this vote
If you already hover over the span in the waterfall view, and you cannot recognise the icon, clicking it will discover the subtype in the span details.
@formgeist your input can be very useful at this point.
TLDR: we initiated this cross-agent alignment in span naming and one of the things that came up is the idea of losing technology prefixes from span names, where they exist, for example:
RabbitMQ RECEIVE from MyQueue
Elasticsearch: GET /index/_search
DynamoDB Query tableName
GraphQL: queryName
The down side is losing the ability to easily identify what technology the span is related to, as well as the ability to do a text-search on large waterfall views.
Therefore, we thought of coupling this change with a proper UI replacement, like icons, and/or anything else that makes sense. This will of course won't guarantee waterfall views without both when old agent is used with new Kibana, but at least we know we deliver them together.
WDYT?
@eyalkoren I think cross-aligning is good and the representation with icons (like we do with service maps and service inventory) makes sense if done subtly. Iβm not sure how open the UI team is to do a lot of work in the timeline component today, as weβre planning to move to the shared Synthetics waterfall chart soon-ish, but I can open a design issue to take a stab at doing a mock of what this will look like and open a enhancement issue in Kibana, then we can propose it to the UI team in the near future. When do you expect this alignment to be in place?
I've opened https://github.com/elastic/observability-design/issues/66 to track the design proposal π
When do you expect this alignment to be in place?
If we decide to block the span naming change until this is addressed, it will have to depend on UI prioritisation. I don't think there is a pressing need to do this change promptly.
@eyalkoren I think in its simplest form we're just replacing the DB icon with the icons we're using in the Service map. For the other types, we're not currently indicating by icon, we'd need to add that. We have a shared function for determining the icon by type, so I think most of the work is already available - it's "just" changing the icon from the DB general icon (which we'd keep as a fallback) and showing the framework icon instead. I've opened a Kibana issue and discuss it with the UI team.
it's "just" changing the icon from the DB general icon (which we'd keep as a fallback) and showing the framework icon instead.
It means changing the algorithm to first look at type
and possibly then on subtype
.
For example:
if type == http
use httpIcon
else if type == db
if subtype == mysql
use mysqlIcon
else if subtype == elasticsearch
use elasticsearchIcon
...
else if type == messaging
if subtype == kafka
use kafkaIcon
else if subtype == rabbitMq
use rabbitMqIcon
...
...
(I made it long on purpose, to make the point that subtypes are important for messaging
spans as well π )
I think in its simplest form we're just replacing the DB icon with the icons we're using in the Service map.
@formgeist It is a minor thing, but does/could that icon include a hover tip with the type/subtype or some differentiating string?
@trentm That's a good suggestion - it could look something like this when hovering the icon.
If there's no icon for a given type/subtype, I think there should be a fallback to show either only the subtype
or type.subtype
in a badge.
If there's no icon for a given type/subtype, I think there should be a fallback to show either only the
subtype
ortype.subtype
in a badge.
Yeah, makes sense π
Aligning span names
This is the initial step towards aligning span names across agents. Naturally, it only deals with frameworks that are not language-specific. We have specs for those, but in some cases, these specs were added in retrospect, reflecting the naming-chaos at the time, so we may want to revisit.
@elastic/apm-agent-devs Please take the time to fill some examples in the tables and vote wherever there is a sign - either add π to an existing option, or add a new one with your π on it. After I gather your feedback, I will make the spec PRs accordingly wherever required.
OpenTelemetry general Span guidelines for reference.
HTTP
The naming scheme for transactions is not rigid in the existing spec. It is more so for spans, where names should have the format
<method> <host>
.OpenTelemetry HTTP spec for reference.
GET unknown route
ServletClassName#doGet
GET /testWithPathMethod/{id}
GET 127.0.0.1
GET [::1]
GET localhost
GET unknown route
GET /hello/:name
GET /hello/{name}
GET host:port
GET unknown route
(vanilla HTTP server)GET /hello/:name
(typical with frameworks)static file
(if using express.static())CORS preflight
(CORS preflight req with hapi framework)POST localhost:9200/myIndex*/_search
(full URL usage is https://github.com/elastic/apm-agent-nodejs/issues/2067)GET http://localhost:52394/sub
(http2, same issue)GET /users/{id}
POST example.com
GET /users/{id}
(only for frameworks that expose the necessary routing info )
POST example.com
PostsController#index
(most likely)GET /users/:id
POST /publish
Rack
(last resort)POST example.com
GET elastic.co
Choose one of the followings:
DB
The spec defines a convention for each DB type.
OpenTelemetry DB spec for reference.
SELECT FROM table
SET
db.collection.drop
Elasticsearch: GET /index/_search
SELECT FROM table
SET
(flush pipeline)
${collection}.${command}
aggregate
Elasticsearch: GET /index/_search
DynamoDB GetItem Movies
S3 PutObject bucket
SELECT FROM table_name
UPDATE table_name
SET
db.collection.command
e.g.elasticapm.test.insert
,system.$cmd.ismaster
[opName] queryNames, ... (/path)
Span name:
GraphQL: [opName] queryNames, ...
(see note below)
Elasticsearch: ${method} ${path}
e.g.Elasticsearch: POST /myIndex*/_search
SELECT FROM table
SELECT FROM table
SET
db.collection.drop
ES GET /myindex/_search
DynamoDB Query tableName
S3 PutObject bucket
SELECT FROM table
SET
db.collection.drop
GET _search
DynamoDB Query tableName
S3 CreateBucket bucket-name
The spec includes prefixed names like:
DynamoDB UpdateItem my_table
S3 GetObject my-bucket
Elasticsearch: GET /index/_search
Choose one of the followings:
subtype
in waterfall)Messaging
The spec defines naming convention that contains lots of prefixing. Kafka is not represented in the spec and in Java it is really bad (100% my fault π€¦ββοΈ ) - a combination of API-based for send spans and prefix-based for receive spans. This is because we don't really have an API in the Kafka client for a retrieval of one record, there is only API for reading a batch, which is not a good fit for transaction name that is based on a single record.
OpenTelemetry messaging spec for reference.
KafkaProducer#send to MyTopic
Transactions:
Kafka record from MyTopic
RabbitMQ RECEIVE from MyQueue
JMS SEND to MyQueue
SQS SEND to queue
SNS PUBLISH myTopic
SQS SEND to queue_name
SQS RECEIVE from queue_name
SQS DELETE from queue_name
SQS SEND to my-queue
SNS PUBLISH to MyTopic
Choose one of the followings:
Kafka send to MyTopic
Record send to MyTopic
orMessage received from MyQueue
<destination_name> <operation_name>
, e.g.MyTopic send
orMyQueue receive
π πMeassgeProducer#send
** Note: it is common in messaging clients that the receive API is designed for batches, in which case the API cannot be used as is for the transaction name. So if you choose this option, propose a name for such transactions.
gRPC
The spec defines to used the method as the name, eg:
/helloworld.Greeter/SayHello
. Since this is a fairly new spec that was approved lately, I am assuming that everybody is happy with it.