elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.81k stars 8.2k forks source link

Obs > APM > Settings > Create Agent Configuration: only allows setting two of many central config vars if no service name info #196958

Open trentm opened 10 hours ago

trentm commented 10 hours ago

Kibana version: v8.15.1

Elasticsearch version: v8.15.1

Server OS version: Elastic cloud deployment in GCP us-west-1

Browser version: Firefox 131.0.3 (aarch64)

Browser OS version: macOS

Original install method (e.g. download page, yum, from source, etc.): cloud deployment

Describe the bug:

If I attempt to create an APM Agent Configuration with Service name: All (and Environment: All), then the Kibana UI only offers a way to set two of the settings (transaction_max_spans and transaction_sample_rate, https://github.com/elastic/kibana/blob/e955cb044fcad83fd9e1c6631eddd95aa7357ad7/x-pack/plugins/observability_solution/apm/common/agent_configuration/setting_definitions/general_settings.ts#L452-L480). These are just two of the many possible agent configuration settings that may or may not apply to a given service, depending on the language of the request APM agent.

Image

The same thing happens if I manually enter a Service name: ... value that isn't in the populated menu list of known service names with recent data.

Image

I did have a couple services with recent data, so this wasn't a completely empty deployment.

Expected behavior:

I would expect to be able to see some (all?) of the other configuration vars. I understand that these agent_configuration settings have excludeAgents and includeAgents fields used to limit the presented config settings to those relevant for the language agent for the given service. However, if the target is "All" service names, or an unknown one, then it is limiting to only allow a subset of the config settings.

The motivating case for my reporting this issue is a user that was not getting APM data for a service at all (or at least not for a long while). One theory for not receiving transaction data was that the sampling flag in incoming HTTP request traceparent headers was resulting in all transactions being discarded. A possible solution of this would be to use the trace_continuation_strategy config var. However, because of this issue one cannot create an Agent Configuration that has a value for trace_continuation_strategy.

Errors in browser console (if relevant): I did not see any, and I don't think it is relevant.

Any additional context:

One guess as to why those particular two settings are always shown is that they use excludeAgents rather than includeAgents. However, there is a 3rd setting that also uses excludeAgents and it is not included:

  {
    key: 'span_frames_min_duration',
    type: 'duration',
    min: '-1ms',
    defaultValue: '5ms',
    label: i18n.translate('xpack.apm.agentConfig.spanFramesMinDuration.label', {
      defaultMessage: 'Span frames minimum duration',
    }),
    description: i18n.translate('xpack.apm.agentConfig.spanFramesMinDuration.description', {
      defaultMessage:
        '(Deprecated, use `span_stack_trace_min_duration` instead!) In its default settings, the APM agent will collect a stack trace with every recorded span.\nWhile this is very helpful to find the exact place in your code that causes the span, collecting this stack trace does have some overhead. \nWhen setting this option to a negative value, like `-1ms`, stack traces will be collected for all spans. Setting it to a positive value, e.g. `5ms`, will limit stack trace collection to spans with durations equal to or longer than the given value, e.g. 5 milliseconds.\n\nTo disable stack trace collection for spans completely, set the value to `0ms`.',
    }),
    excludeAgents: ['js-base', 'rum-js', 'nodejs', 'php', 'android/java', 'iOS/swift'],
  },

If Kibana behaviour were to change here to show all possible config vars if the target APM agent language was unknown, then possibly it would be nice if the UX changed to show the excludeAgents and includeAgents values in some form to give the user at least a start at knowing which config settings would be applicable. Yes, this might open a bit of a can of worms for users expecting a certain config setting to work for a language that doesn't support it.

If there is a concern that passing a config setting to a language agent that doesn't support it could cause harm: the APM agents spec requires that APM agents ignore central config settings they don't know. https://github.com/elastic/apm/blob/main/specs/agents/configuration.md#dealing-with-errors

If the agent receives a known but invalid config attribute, it should log a warning such as: Central config failure. Invalid value for transactionSampleRate: 1.2 (out of range [0,1.0]) Failure to process one config attribute should not affect processing of others.

elasticmachine commented 10 hours ago

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)