elastic / apm

Elastic Application Performance Monitoring - resources and general issue tracking for Elastic APM.
https://www.elastic.co/apm
Apache License 2.0
384 stars 114 forks source link

Removing `span.context.message.body` from long-field spec #769

Closed eyalkoren closed 1 year ago

eyalkoren commented 1 year ago

In our messaging spec we only talk about capturing message bodies for transactions and not spans.

apmmachine commented 1 year ago

:green_heart: Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

#### Build stats * Start Time: 2023-03-20T04:38:00.029+0000 * Duration: 3 min 2 sec

eyalkoren commented 1 year ago

@trentm in addition I just noticed that the spec assumes that we treat error.exception.message as a long field, however at least in the Java agent this is not the case. I wonder why this needs to be such. Are such long exception messages something you are familiar with? Do you know if other agents treat that as a long field as well? Same question goes to error.log.message, which the Java agent doesn't use atm.

trentm commented 1 year ago

In our messaging spec we only talk about capturing message bodies for transactions and not spans.

In the PR that added long_field_max_length I included span.context.message.body because it is part of the intake v2 schema. Wouldn't it be odd to have long_field_max_length act on transaction.context.message.body but not span.context.message.body?
This is a theoretical/philosophical argument, however.

The Node.js APM agent doesn't currently automatically capture .context.message.body at all -- not for transactions or spans. I could see having a feature request for the Node.js APM agent to capture the body of incoming SQS/SNS messages, however -- perhaps similar to the existing capture_body config var for capturing incoming HTTP request bodies. In that case, as a user, I think I'd like the ability to use long_field_max_length to control the truncation on that field.

This is perhaps a bit of a stretch, but https://github.com/elastic/apm/blob/main/tests/agents/json-specs/service_resource_inference.json#L238-L250 includes a few examples of messaging spans with a span.context.message.body.

Taking a step back, my view of long_field_max_length was to cover fields that:

in addition I just noticed that the spec assumes that we treat error.exception.message as a long field ...

I have a similar answer for the error. fields, as well. They are strings (as opposed to name or identifier fields) from user code that could get really long -- so the argument is that APM agents should have some upper bound on them.

Oh, is your argument that these fields shouldn't be "long"? I.e. that they should be truncated at something shorter than the default 10k characters? Is there anything in the spec or intake schema that talks about otherwise truncating these fields? I didn't think so.

eyalkoren commented 1 year ago

In the https://github.com/elastic/apm/pull/493 I included span.context.message.body because it is part of the intake v2 schema

I noticed that long ago but probably made the mistake of placing it in a comment for something else and not following up. It shouldn't be in the schema.

Wouldn't it be odd to have long_field_max_length act on transaction.context.message.body but not span.context.message.body? This is a theoretical/philosophical argument, however.

I think it's not purely theoretical, I think that having it in the spec implies that it is a valid field. I don't have a strong opinion on whether it makes sense to add it in the spec or not, but a bit stronger opinion whether is should be included in the configuration documentation. I didn't add it in the Java docs, I think it is misleading.

I think I'd like the ability to use long_field_max_length to control the truncation on that field

Definitely, I see this as equivalent to HTTP request bodies. We DO capture message bodies in the Java agent and I did applied this config for that. I was only referring to the span body, which should not be captured.

Oh, is your argument that these fields shouldn't be "long"? I.e. that they should be truncated at something shorter than the default 10k characters? Is there anything in the spec or intake schema that talks about otherwise truncating these fields? I didn't think so.

We probably never made this part of the spec clear enough. The way the Java agent treat it is - always trim to 1024, unless this is a long field. I think it is a good way to define this, but I am biased 🙂 . One example of why it is preferable, is that without defining it this way, it is hard to explain why an error message needs that high of a character limit.

trentm commented 1 year ago

It shouldn't be in the schema.

Should we start an issue/PR to deprecate that field? That would help move my opinion to drop it from long_field_max_length.

A brief digression: You don't think we'd want to entertain a possible feature to capture the message body for outgoing messages on a span?

We probably never made this part of the spec clear enough. The way the Java agent treat it is - always trim to 1024, unless this is a long field.

I wrote that. :) My intent had been to point out that the schema already does have "maxLength": 1024 for some string fields. The schema isn't very thorough at doing so, however. Yes perhaps you are right that the basic intent is that all strings (except a specific few) are limited to 1024 chars.

I think it would be worthwhile to make that language more explicit wiht something like "APM agents MUST [or SHOULD?] truncate all strings at 1024 chars except those listed below ...".

@trentm in addition I just noticed that the spec assumes that we treat error.exception.message as a long field, however at least in the Java agent this is not the case. I wonder why this needs to be such. Are such long exception messages something you are familiar with?

Back to this question. Is error.exception.message truncated at 1024 chars in the Java agent, then?

When I added long_field_max_length, the Node.js APM agent already had a errorMessageMaxLength separate config var. errorMessageMaxLength was originally added to change from not truncating that field at all, to truncating it at 2048 chars. It was added because of this use case where one could get an error message that was megabytes in length.

I decided to deprecate errorMessageMaxLength and just have the single long_field_max_length that handled that case as well. This allows users to control that truncation length if they want -- at the possible limitation that there is only one setting for setting the max length of multiple fields. Too bad if you want short context.db.statement but long error.exception.message.

I don't have user data right now on whether having error.exception.message values longer than 1024 chars by default is useful/helpful.

eyalkoren commented 1 year ago

Should we start an issue/PR to deprecate that field? That would help move my opinion to drop it from long_field_max_length.

I wouldn't. It doesn't bother me enough to spend more time on this. I do think it is misleading in docs, so I did not include it when added this config to Java.

I wrote that....

I know. I based my addition on your PRs here and in the Node agent. I linked so that anyone stumbling into this issue can see the current state. My intention was that you couldn't find any such explicit intention anywhere because we didn't do a good job declaring it.

I think it would be worthwhile to make that language more explicit wiht something like "APM agents MUST [or SHOULD?] truncate all strings at 1024 chars except those listed below ...".

Agree. I wouldn't use MUST, as it may imply more effort is required from other agents, without anyone complaining on the current state. SHOULD should (or must 🤔 ) be good.

Is error.exception.message truncated at 1024 chars in the Java agent, then?

Yes.

I don't have user data right now on whether having error.exception.message values longer than 1024 chars by default is useful/helpful.

We know it was never required in other agents. But this is not a problem- the spec can also use MAY for non-obvious fields like that.

Trying to summarize the discussion into an action item, this is what I propose:

Let me know what you want and I'll propose the changes accordingly.

trentm commented 1 year ago

this is what I propose:

That sounds good to me. Thanks for summarizing, Eyal.

  • funny [...] software people

Can we get "kibi-char" into the spec? :)

eyalkoren commented 1 year ago

Updated the spec as agreed above. In addition, I also specified the 10K length as the preferred size limit for long fields where configuration is not supported. I also specified truncating instructions - replacing the last character with "…" to provide an indication that the value was truncated.

eyalkoren commented 1 year ago

Good enough for me, we can reiterate if needed.