Add ability to capture non-indexed large content for spans

eyalkoren commented 1 year ago

Is your feature request related to a problem? Please describe.

We currently have a distinction between large custom context data that is not intended for indexing and such that is purposed for indexing. The former can be added through agents API, but only to transactions and errors. The latter, labels, are applicable for spans as well, but are limited in size. There are cases where users want to add large textual data to spans' context, without getting it indexed, for example- as described in this forum request.

Describe the solution you'd like

We should consider adding the custom context API to spans as well. It will have the same limitations as the existing one has in terms of maximum allowed size and not being indexed. In order to offer this capability through OTel API as well, we may decide on a specific attribute that will be mapped to the custom context field that has the proper ES mapping.

For now, this issue is only for 👍 and 👎 or any further discussion. A spec PR will follow if we decide that it is useful and important enough.

Describe alternatives you've considered

Another alternative is to use the labels API for custom context data as well. However, this requires a change to the current ECS definitions for the mapping of this field, adding the ignore_above: 1024 to it. This will allow automatic mapping decision based on size. However, since sending very large labels to older APM indices may be problematic, it will require agents to be aware of the actual backend mapping, which would probably be an overkill. If it was only dependent on the APM Server version, that would be OK, as agents are already aware of that (required for the numeric and boolean labels), but this means also a change in APM Server to override the ECS mapping for this field.

trentm commented 1 year ago

I added my :+1:, but have some questions.

... it will require agents to be aware of the actual backend mapping, which would probably be an overkill. If it was only dependent on the APM Server version, that would be OK, ..., but this means also a change in APM Server to override the ECS mapping for this field.

I am missing something. Couldn't we have a newer APM server version that guaranteed ignore_above: 1024 on all label values? If so, then agents could check for that APM server version and allow longer label values.

eyalkoren commented 1 year ago

I am missing something. Couldn't we have a newer APM server version that guaranteed ignore_above: 1024 on all label values? If so, then agents could check for that APM server version and allow longer label values.

The mappings for these fields are not set by the APM Server, which only refers this field's mapping to ECS, where ignore_above is not set. I opened this issue when I thought it is the end of the story (thinking the agents will need to know more than the APM version), but then the plot got a turn. It turns out that nowadays APM Server relies exclusively on Fleet to set up its index templates, even if running in standalone mode, and Fleet does set the ignore_above: 1024, so we can use labels without being dependent on ECS changes and versions. Still, the APM Server would currently reject labels with values > 1024, but that is simpler as we can change it and only make the agents aware of APM Server version.

Because this is all too clear, there was a need to complicate it some more, so what we did is call those labels in agents dialect, tags in intake V2 and then labels again in our ES indices, although there are also tags in ECS, for which ignore_above: 1024 IS set. I hope that was a fun reading.

Bottom line: apparently we CAN use labels for non-indexed large custom data. It will require the following changes:

APM Server - increase the labels size limit (to 10K?)
APM Agents
- if APM Server supports it, allow setting labels with values > 1024
- change the docs about how labels are treated (size and indexing)

So now the question is only whether we want to use labels for this type of data or is it not the proper field. @felixbarny @gregkalapos @trentm and anyone else that read all the way to here - what would you prefer?

I don't have any strong opinion towards one vs the other.

simitt commented 1 year ago

Ignoring the technicalities and details on the apm-server and index mapping for a bit - if we purely relied on the labels fields with ignore_above: 1024, then this would mean that instead of consistently indexing or not indexing a certain field, the indexing would be dependent on the field value. For some documents a specific field would be indexed, while for others it wouldn't. Is that what we want?

trentm commented 1 year ago

I also don't have a strong opinion either way.

A nice thing about the "make labels >1024 bytes work" option -- which I think Felix pointed out separately -- is that it would work for an OTel agent and OTLP intake. The OTel spec's default attribute value limit is Infinity. Currently APM server's OTLP intake truncates labels at 1024, except db.statement, so that would need to change.
If a user somewhat frequently uses long labels, does that unnecessarily blow out the index size, and hence cost? Perhaps that doesn't need to be too much of a concern because stashing lots of data on spans isn't really a core/intended feature anyway?

eyalkoren commented 1 year ago

Ignoring the technicalities...

Please don't ignore, if any of what I wrote is inaccurate, please let us know as we will base our decision also on this information

it would work for an OTel agent and OTLP intake. The OTel spec's default attribute value limit is Infinity. Currently APM server's OTLP intake truncates labels at 1024, except db.statement, so that would need to change.

This is not specific to labels though, is it? We map all sorts of attributes to our fields, so this is relevant to any limitation we have on agents, server and ES mappings for all mapped fields IIANM

simitt commented 1 year ago

Ignoring the technicalities...

Please don't ignore, if any of what I wrote is inaccurate, please let us know as we will base our decision also on this information

Just ment for the time being to focus on the conversation whether this is the path forward. Btw, I created https://github.com/elastic/apm-server/issues/8766 a while back, which is relevant for this conversation.

elastic / apm

Add ability to capture non-indexed large content for spans #752