elastic / apm

Elastic Application Performance Monitoring - resources and general issue tracking for Elastic APM.
https://www.elastic.co/apm
Apache License 2.0
383 stars 114 forks source link

Enable Remote Config for all dynamic parameters. #213

Closed lreuven closed 4 years ago

lreuven commented 4 years ago

Description of the issue

Till version 7.6 , we have exposed 3 agents config options, CAPTURE_BODY, IGNORE_URLS & TRANSACTION_MAX_SPANS see here. From UI Impl perspective, those options were static in the UI and adding each new config requires UI dev.

We would like to move for the next phase and expose most of the config options from APM UI. here is the list of configuration options which we plan to support ( a unified cross agent list).

Name Type Default
ACTIVE Boolean TRUE
API_REQUEST_SIZE Bytes 768kb
API_REQUEST_TIME Duration 10s
CAPTURE_BODY String FALSE
CAPTURE_HEADERS Boolean TRUE
ENABLE_LOG_CORRELATION Boolean FALSE
FILTER_EXCEPTION_TYPES List Empty
IGNORE_URLS List Empty
LOG_LEVEL String Info
SANITIZE_FIELD_NAMES List  
SERVER_TIMEOUT Duration  
SPAN_FRAMES_MIN_DURATION Duration  
STACK_TRACE_LIMIT Integer 50
TRANSACTION_MAX_SPANS Integer 500
TRANSACTION_SAMPLE_RATE Float 1.0
URL_GROUPS List  
ENVIRONMENT String "production"
TRACE_METHODS_DURATION_THRESHOLD Integer  

################################ there are few phases to this issue : 1.Align config options in agents - we still have few gaps between the agent as can be seen here. 2.Expose the config metadata of each agent via .yaml file 3.Send the the file to an end point 4.Consume & build the UI based on the file.

felixbarny commented 4 years ago

2.Expose the config metadata of each agent via .yaml file

Agents should generate or hard-code a JSON definition of which options are supported.

The schema would be like this:

[
  {
    "key": "the configuration key, without the elastic_apm prefix. For example `active`.",
    "type": "String|URL|Boolean|Double|Integer|List|Enum|TimeDuration|ByteValue",
    "enum": ["a list of values allowed for this option, for example", "TRACE", "DEBUG", "INFO", "WARN", "ERROR"],
    "category": "A string describing which category this option belongs to, allows the UI to group options. For example Core, Reporter, Stacktrace, Logging, HTTP, Messaging",
    "default": "The default value of this option as a string",
    "tags": ["An optional array of tags for this option. For example:", "performance", "security"],
    "since": "The version of the agent when this option was introduced. For example 1.2.3",
    "description": "A text describing the semantics of this option",
    "validation": {
      "min": "the min value for options such as transaction_sampling_rate or profiling_sampling_interval (inclusive). May be numerical or a string, for example for time durations like. Examples: `0`, `\"1ms\"`"
      "max": "the max value (inclusive)"
      "negativeMatch": false, // when set to true, the option must not be within the range (optional, default false)
      "regex": "a regex pattern the option has to validate against"
    }
  },
  ...
]

When the agents send up this information on startup the UI can be rendered generically, based on the options each agent supports. As the UI doesn't hard-code the options, it can display new ones without having to update Kibana.

Open questions

Generated options from the Java agent

Click to expand ``` [ { "key": "active", "type": "Boolean", "category": "Core", "default": "true", "since": "1.0.0", "description": "A boolean specifying if the agent should be active or not.\nWhen active, the agent instruments incoming HTTP requests, tracks errors and collects and sends metrics.\nWhen inactive, the agent works as a noop, not collecting data and not communicating with the APM sever.\nAs this is a reversible switch, agent threads are not being killed when inactivated, but they will be \nmostly idle in this state, so the overhead should be negligible.\n\nYou can use this setting to dynamically disable Elastic APM at runtime." }, { "key": "transaction_sample_rate", "type": "Double", "category": "Core", "default": "1.0", "tags": ["performance"], "since": "1.0.0", "validation": { "min": 0, "max": 1, "negativeMatch": false }, "description": "By default, the agent will sample every transaction (e.g. request to your service). To reduce overhead and storage requirements, you can set the sample rate to a value between 0.0 and 1.0. We still record overall time and the result for unsampled transactions, but no context information, labels, or spans." }, { "key": "transaction_max_spans", "type": "Integer", "category": "Core", "default": "500", "tags": ["performance"], "since": "1.0.0", "description": "Limits the amount of spans that are recorded per transaction.\n\nThis is helpful in cases where a transaction creates a very high amount of spans (e.g. thousands of SQL queries).\n\nSetting an upper limit will prevent overloading the agent and the APM server with too much work for such edge cases.\n\nA message will be logged when the max number of spans has been exceeded but only at a rate of once every 5 minutes to ensure performance is not impacted." }, { "key": "sanitize_field_names", "type": "List", "category": "Core", "default": "password,passwd,pwd,secret,*key,*token*,*session*,*credit*,*card*,authorization,set-cookie", "tags": ["security"], "since": "1.0.0", "description": "Sometimes it is necessary to sanitize the data sent to Elastic APM,\ne.g. remove sensitive data.\n\nConfigure a list of wildcard patterns of field names which should be sanitized.\nThese apply for example to HTTP headers and `application/x-www-form-urlencoded` data.\n\nThis option supports the wildcard `*`, which matches zero or more characters.\nExamples: `/foo/*/bar/*/baz*`, `*foo*`.\nMatching is case insensitive by default.\nPrepending an element with `(?-i)` makes the matching case sensitive.\n\nNOTE: Data in the query string is considered non-sensitive,\nas sensitive information should not be sent in the query string.\nSee https://www.owasp.org/index.php/Information_exposure_through_query_strings_in_url for more information\n\nNOTE: Review the data captured by Elastic APM carefully to make sure it does not capture sensitive information.\nIf you do find sensitive data in the Elasticsearch index,\nyou should add an additional entry to this list (make sure to also include the default entries)." }, { "key": "unnest_exceptions", "type": "List", "category": "Core", "default": "(?-i)*Nested*Exception", "since": "1.0.0", "description": "When reporting exceptions,\nun-nests the exceptions matching the wildcard pattern.\nThis can come in handy for Spring's `org.springframework.web.util.NestedServletException`,\nfor example.\n\nThis option supports the wildcard `*`, which matches zero or more characters.\nExamples: `/foo/*/bar/*/baz*`, `*foo*`.\nMatching is case insensitive by default.\nPrepending an element with `(?-i)` makes the matching case sensitive." }, { "key": "ignore_exceptions", "type": "List", "category": "Core", "default": "", "tags": [], "since": "1.11.0", "description": "A list of exceptions that should be ignored and not reported as errors.\nThis allows to ignore exceptions thrown in regular control flow that are not actual errors\n\nThis option supports the wildcard `*`, which matches zero or more characters.\nExamples: `/foo/*/bar/*/baz*`, `*foo*`.\nMatching is case insensitive by default.\nPrepending an element with `(?-i)` makes the matching case sensitive.\n\nExamples:\n\n - `com.mycompany.ExceptionToIgnore`: using fully qualified name\n - `*ExceptionToIgnore`: using wildcard to avoid package name\n - `*exceptiontoignore`: case-insensitive by default\n\nNOTE: Exception inheritance is not supported, thus you have to explicitly list all the thrown exception types" }, { "key": "capture_body", "type": "Enum", "category": "Core", "default": "OFF", "tags": ["performance"], "enum": ["OFF", "ERRORS", "TRANSACTIONS", "ALL"], "since": "1.0.0", "description": "For transactions that are HTTP requests, the Java agent can optionally capture the request body (e.g. POST \nvariables). For transactions that are initiated by receiving a JMS text message, the agent can capture the \ntextual message body.\n\nIf the HTTP request or the JMS message has a body and this setting is disabled, the body will be shown as [REDACTED].\n\nThis option is case-insensitive.\n\nNOTE: Currently, only UTF-8 encoded plain text HTTP content types are supported.\nThe option <> determines which content types are captured.\n\nWARNING: Request bodies often contain sensitive values like passwords, credit card numbers etc.\nIf your service handles data like this, we advise to only enable this feature with care.\nTurning on body capturing can also significantly increase the overhead in terms of heap usage,\nnetwork utilisation and Elasticsearch index size." }, { "key": "capture_headers", "type": "Boolean", "category": "Core", "default": "true", "tags": ["performance"], "since": "1.0.0", "description": "If set to `true`, the agent will capture request and response headers, including cookies.\n\nNOTE: Setting this to `false` reduces network bandwidth, disk space and object allocations." }, { "key": "central_config", "type": "Boolean", "category": "Core", "default": "true", "tags": [], "since": "1.8.0", "description": "When enabled, the agent will make periodic requests to the APM Server to fetch updated configuration." }, { "key": "use_elastic_traceparent_header", "type": "Boolean", "category": "Core", "default": "true", "tags": [], "since": "1.14.0", "description": "To enable {apm-overview-ref-v}/distributed-tracing.html[distributed tracing], the agent\nadds trace context headers to outgoing requests (like HTTP requests, Kafka records, gRPC requests etc.).\nThese headers (`traceparent` and `tracestate`) are defined in the\nhttps://www.w3.org/TR/trace-context-1/[W3C Trace Context] specification.\n\nWhen this setting is `true`, the agent will also add the header `elastic-apm-traceparent`\nfor backwards compatibility with older versions of Elastic APM agents." }, { "key": "server_urls", "type": "List", "category": "Reporter", "default": "http://localhost:8200", "since": "1.0.0", "description": "The URLs must be fully qualified, including protocol (http or https) and port.\n\nFails over to the next APM Server URL in the event of connection errors.\nAchieves load-balancing by shuffling the list of configured URLs.\nWhen multiple agents are active, they'll tend towards spreading evenly across the set of servers due to randomization.\n\nIf outgoing HTTP traffic has to go through a proxy,you can use the Java system properties `http.proxyHost` and `http.proxyPort` to set that up.\nSee also [Java's proxy documentation](https://docs.oracle.com/javase/8/docs/technotes/guides/net/proxies.html) for more information.\n\nNOTE: This configuration can only be reloaded dynamically as of 1.8.0" }, { "key": "server_timeout", "type": "TimeDuration", "category": "Reporter", "default": "5s", "since": "1.0.0", "description": "If a request to the APM server takes longer than the configured timeout,\nthe request is cancelled and the event (exception or transaction) is discarded.\nSet to 0 to disable timeouts.\n\nWARNING: If timeouts are disabled or set to a high value, your app could experience memory issues if the APM server times out." }, { "key": "max_queue_size", "type": "Integer", "category": "Reporter", "default": "512", "since": "1.0.0", "description": "The maximum size of buffered events.\n\nEvents like transactions and spans are buffered when the agent can't keep up with sending them to the APM Server or if the APM server is down.\n\nIf the queue is full, events are rejected which means you will lose transactions and spans in that case.\nThis guards the application from crashing in case the APM server is unavailable for a longer period of time.\n\nA lower value will decrease the heap overhead of the agent,\nwhile a higher value makes it less likely to lose events in case of a temporary spike in throughput." }, { "key": "api_request_time", "type": "TimeDuration", "category": "Reporter", "default": "10s", "since": "1.0.0", "description": "Maximum time to keep an HTTP request to the APM Server open for.\n\nNOTE: This value has to be lower than the APM Server's `read_timeout` setting." }, { "key": "api_request_size", "type": "ByteValue", "category": "Reporter", "default": "768kb", "since": "1.0.0", "description": "The maximum total compressed size of the request body which is sent to the APM server intake api via a chunked encoding (HTTP streaming).\nNote that a small overshoot is possible.\n\nAllowed byte units are `b`, `kb` and `mb`. `1kb` is equal to `1024b`." }, { "key": "application_packages", "type": "Collection", "category": "Stacktrace", "default": "", "since": "1.0.0", "description": "Used to determine whether a stack trace frame is an 'in-app frame' or a 'library frame'.\nThis allows the APM app to collapse the stack frames of library code,\nand highlight the stack frames that originate from your application.\nMultiple root packages can be set as a comma-separated list;\nthere's no need to configure sub-packages.\nBecause this setting helps determine which classes to scan on startup,\nsetting this option can also improve startup time.\n\nYou must set this option in order to use the API annotations `@CaptureTransaction` and `@CaptureSpan`.\n\n**Example**\n\nMost Java projects have a root package, e.g. `com.myproject`. You can set the application package using Java system properties:\n`-Delastic.apm.application_packages=com.myproject`\n\nIf you are only interested in specific subpackages, you can separate them with commas:\n`-Delastic.apm.application_packages=com.myproject.api,com.myproject.impl`" }, { "key": "stack_trace_limit", "type": "Integer", "category": "Stacktrace", "default": "50", "tags": ["performance"], "since": "1.0.0", "description": "Setting it to 0 will disable stack trace collection. Any positive integer value will be used as the maximum number of frames to collect. Setting it -1 means that all frames will be collected." }, { "key": "span_frames_min_duration", "type": "TimeDuration", "category": "Stacktrace", "default": "5ms", "tags": ["performance"], "since": "1.0.0", "description": "In its default settings, the APM agent will collect a stack trace with every recorded span.\nWhile this is very helpful to find the exact place in your code that causes the span, collecting this stack trace does have some overhead. \nWhen setting this option to a negative value, like `-1ms`, stack traces will be collected for all spans. Setting it to a positive value, e.g. `5ms`, will limit stack trace collection to spans with durations equal to or longer than the given value, e.g. 5 milliseconds.\n\nTo disable stack trace collection for spans completely, set the value to `0ms`." }, { "key": "log_level", "type": "Enum", "category": "Logging", "default": "INFO", "enum": ["ERROR", "WARN", "INFO", "DEBUG", "TRACE"], "since": "1.0.0", "description": "Sets the logging level for the agent.\n\nThis option is case-insensitive." }, { "key": "enable_log_correlation", "type": "Boolean", "category": "Logging", "default": "false", "since": "1.0.0", "description": "A boolean specifying if the agent should integrate into SLF4J's https://www.slf4j.org/api/org/slf4j/MDC.html[MDC] to enable trace-log correlation.\nIf set to `true`, the agent will set the `trace.id` and `transaction.id` for the currently active spans and transactions to the MDC.\nSee <> for more details.\n\nNOTE: While it's allowed to enable this setting at runtime, you can't disable it without a restart." }, { "key": "capture_body_content_types", "type": "List", "category": "HTTP", "default": "application/x-www-form-urlencoded*,text/*,application/json*,application/xml*", "tags": ["performance"], "since": "1.5.0", "description": "Configures which content types should be recorded.\n\nThe defaults end with a wildcard so that content types like `text/plain; charset=utf-8` are captured as well.\n\nThis option supports the wildcard `*`, which matches zero or more characters.\nExamples: `/foo/*/bar/*/baz*`, `*foo*`.\nMatching is case insensitive by default.\nPrepending an element with `(?-i)` makes the matching case sensitive." }, { "key": "ignore_urls", "type": "List", "category": "HTTP", "default": "\/VAADIN/*,/heartbeat*,/favicon.ico,*.js,*.css,*.jpg,*.jpeg,*.png,*.gif,*.webp,*.svg,*.woff,*.woff2", "since": "1.0.0", "description": "Used to restrict requests to certain URLs from being instrumented.\n\nThis property should be set to an array containing one or more strings.\nWhen an incoming HTTP request is detected, its URL will be tested against each element in this list.\n\nThis option supports the wildcard `*`, which matches zero or more characters.\nExamples: `/foo/*/bar/*/baz*`, `*foo*`.\nMatching is case insensitive by default.\nPrepending an element with `(?-i)` makes the matching case sensitive.\n\nNOTE: All errors that are captured during a request to an ignored URL are still sent to the APM Server regardless of this setting." }, { "key": "ignore_user_agents", "type": "List", "category": "HTTP", "default": "", "since": "1.0.0", "description": "Used to restrict requests from certain User-Agents from being instrumented.\n\nWhen an incoming HTTP request is detected,\nthe User-Agent from the request headers will be tested against each element in this list.\nExample: `curl/*`, `*pingdom*`\n\nThis option supports the wildcard `*`, which matches zero or more characters.\nExamples: `/foo/*/bar/*/baz*`, `*foo*`.\nMatching is case insensitive by default.\nPrepending an element with `(?-i)` makes the matching case sensitive.\n\nNOTE: All errors that are captured during a request by an ignored user agent are still sent to the APM Server regardless of this setting." }, { "key": "url_groups", "type": "List", "category": "HTTP", "default": "", "since": "1.0.0", "description": "This option is only considered, when `use_path_as_transaction_name` is active.\n\nWith this option, you can group several URL paths together by using a wildcard expression like `/user/*`.\n\nThis option supports the wildcard `*`, which matches zero or more characters.\nExamples: `/foo/*/bar/*/baz*`, `*foo*`.\nMatching is case insensitive by default.\nPrepending an element with `(?-i)` makes the matching case sensitive." }, { "key": "ignore_message_queues", "type": "List", "category": "Messaging", "default": "", "since": "1.0.0", "description": "Used to filter out specific messaging queues/topics from being traced. \n\nThis property should be set to an array containing one or more strings.\nWhen set, sends-to and receives-from the specified queues/topic will be ignored.\n\nThis option supports the wildcard `*`, which matches zero or more characters.\nExamples: `/foo/*/bar/*/baz*`, `*foo*`.\nMatching is case insensitive by default.\nPrepending an element with `(?-i)` makes the matching case sensitive." }, { "key": "capture_jmx_metrics", "type": "List", "category": "JMX", "default": "", "tags": [], "since": "1.11.0", "description": "Report metrics from JMX to the APM Server\n\nCan contain multiple comma separated JMX metric definitions:\n\n----\nobject_name[] attribute[:metric_name=]\n----\n\n* `object_name`:\n+\nFor more information about the JMX object name pattern syntax,\nsee the https://docs.oracle.com/javase/7/docs/api/javax/management/ObjectName.html[`ObjectName` Javadocs].\n* `attribute`:\n+\nThe name of the JMX attribute.\nThe JMX value has to be either a `Number` or a composite where the composite items are numbers.\nThis element can be defined multiple times.\nAn attribute can contain optional properties.\nThe syntax for that is the same as for https://docs.oracle.com/javase/7/docs/api/javax/management/ObjectName.html[`ObjectName`].\n+\n** `metric_name`:\n+\nA property within `attribute`.\nThis is the name under which the metric will be stored.\nSetting this is optional and will be the same as the `attribute` if not set.\nNote that all JMX metric names will be prefixed with `jvm.jmx.` by the agent.\n\nThe agent creates `labels` for each link:https://docs.oracle.com/javase/7/docs/api/javax/management/ObjectName.html#getKeyPropertyList()[JMX key property] such as `type` and `name`.\n\nThe link:https://docs.oracle.com/javase/7/docs/api/javax/management/ObjectName.html[JMX object name pattern] supports wildcards.\nIn this example, the agent will create a metricset for each memory pool `name` (such as `G1 Old Generation` and `G1 Young Generation`)\n\n----\nobject_name[java.lang:type=GarbageCollector,name=*] attribute[CollectionCount:metric_name=collection_count] attribute[CollectionTime]\n----\n\nThe resulting documents in Elasticsearch look similar to these (metadata omitted for brevity):\n\n[source,json]\n----\n{\n \"@timestamp\": \"2019-08-20T16:51:07.512Z\",\n \"jvm\": {\n \"jmx\": {\n \"collection_count\": 0,\n \"CollectionTime\": 0\n }\n },\n \"labels\": {\n \"type\": \"GarbageCollector\",\n \"name\": \"G1 Old Generation\"\n }\n}\n----\n\n[source,json]\n----\n{\n \"@timestamp\": \"2019-08-20T16:51:07.512Z\",\n \"jvm\": {\n \"jmx\": {\n \"collection_count\": 2,\n \"CollectionTime\": 11\n }\n },\n \"labels\": {\n \"type\": \"GarbageCollector\",\n \"name\": \"G1 Young Generation\"\n }\n}\n----\n\n\nThe agent also supports composite values for the attribute value.\nIn this example, `HeapMemoryUsage` is a composite value, consisting of `committed`, `init`, `used` and `max`.\n----\nobject_name[java.lang:type=Memory] attribute[HeapMemoryUsage:metric_name=heap] \n----\n\nThe resulting documents in Elasticsearch look similar to this:\n\n[source,json]\n----\n{\n \"@timestamp\": \"2019-08-20T16:51:07.512Z\",\n \"jvm\": {\n \"jmx\": {\n \"heap\": {\n \"max\": 4294967296,\n \"init\": 268435456,\n \"committed\": 268435456,\n \"used\": 22404496\n }\n }\n },\n \"labels\": {\n \"type\": \"Memory\"\n }\n}\n----\n" }, { "key": "profiling_spans_enabled", "type": "Boolean", "category": "Profiling", "default": "false", "tags": [], "since": "1.14.0", "description": "Set to `true` to make the agent create spans for method executions based on\nhttps://github.com/jvm-profiling-tools/async-profiler[async-profiler], a sampling aka statistical profiler.\n\nIf this is enabled, the agent will start a profiling session every\n<> which lasts for <>.\nIf a transaction happens within a profiling session,\nthe agent creates spans for slow methods.\n\nDue to the nature of how sampling profilers work,\nthe duration of the inferred spans are not exact, but only estimations.\nThe <> lets you fine tune the trade-off between accuracy and overhead.\n\nThe inferred spans are created after a profiling session has ended.\nThis means there is a delay between the regular and the inferred spans being visible in the UI.\n\nNOTE: This feature is not available on Windows" }, { "key": "profiling_sampling_interval", "type": "TimeDuration", "category": "Profiling", "default": "20ms", "tags": [], "since": "1.14.0", "validation": { "min": "1ms", "max": "1s", "negativeMatch": false }, "description": "The frequency at which stack traces are gathered within a profiling session.\nThe lower you set it, the more accurate the durations will be.\nThis comes at the expense of higher overhead and more spans for potentially irrelevant operations.\nThe minimal duration of a profiling-inferred span is the same as the value of this setting." }, { "key": "profiling_spans_min_duration", "type": "TimeDuration", "category": "Profiling", "default": "0ms", "tags": [], "since": "1.14.0", "validation": { "min": "0ms", "negativeMatch": false }, "description": "The minimum duration of an inferred span.\nNote that the min duration is also implicitly set by the sampling interval.\nHowever, increasing the sampling interval also decreases the accuracy of the duration of inferred spans." }, { "key": "profiling_included_classes", "type": "List", "category": "Profiling", "default": "*", "tags": [], "since": "1.14.0", "description": "If set, the agent will only create inferred spans for methods which match this list.\nSetting a value may slightly increase performance and can reduce clutter by only creating spans for the classes you are interested in.\nExample: `org.example.myapp.*`\n\nThis option supports the wildcard `*`, which matches zero or more characters.\nExamples: `/foo/*/bar/*/baz*`, `*foo*`.\nMatching is case insensitive by default.\nPrepending an element with `(?-i)` makes the matching case sensitive." }, { "key": "profiling_excluded_classes", "type": "List", "category": "Profiling", "default": "(?-i)java.*,(?-i)javax.*,(?-i)sun.*,(?-i)com.sun.*,(?-i)jdk.*,(?-i)org.apache.tomcat.*,(?-i)org.apache.catalina.*,(?-i)org.apache.coyote.*,(?-i)org.jboss.as.*,(?-i)org.glassfish.*,(?-i)org.eclipse.jetty.*,(?-i)com.ibm.websphere.*", "tags": [], "since": "1.14.0", "description": "Excludes classes for which no profiler-inferred spans should be created.\n\nThis option supports the wildcard `*`, which matches zero or more characters.\nExamples: `/foo/*/bar/*/baz*`, `*foo*`.\nMatching is case insensitive by default.\nPrepending an element with `(?-i)` makes the matching case sensitive." }, { "key": "profiling_interval", "type": "TimeDuration", "category": "Profiling", "default": "61s", "tags": [], "since": "1.14.0", "validation": { "min": "0ms", "negativeMatch": false }, "description": "The interval at which profiling sessions should be started." }, { "key": "profiling_duration", "type": "TimeDuration", "category": "Profiling", "default": "10s", "tags": [], "since": "1.14.0", "validation": { "min": "1s", "negativeMatch": false }, "description": "The duration of a profiling session.\nFor sampled transactions which fall within a profiling session (they start after and end before the session),\nso-called inferred spans will be created.\nThey appear in the trace waterfall view like regular spans.\n\nNOTE: It is not recommended to set much higher durations as it may fill the activation events file and async-profiler's frame buffer.\nWarnings will be logged if the activation events file is full.\nIf you want to have more profiling coverage, try decreasing <>." } ] ```

@elastic/apm-agent-devs WDYT? Anything missing from here? @dgieselaar is this a definition of config options something you can work with? I guess all options would be indexed as individual documents and extended with the usual metadata (like ephemeral id and agent name).

To get a list of applicable options, you can do this:

felixbarny commented 4 years ago

WIP Java agent PR: https://github.com/elastic/apm-agent-java/pull/1046

beniwohli commented 4 years ago

Great stuff! Just a couple of questions/comments

felixbarny commented 4 years ago
beniwohli commented 4 years ago

Regarding since, the Python agent, and I assume most other agents, didn't really keep track of which version introduced which config option. git blame to the rescue, but it's tedious work. That's why I was wondering if we'd have an immediate benefit from it. Maybe it's also enough to set since to the version this feature will be released with for all existing config options, and then take it from there?

jalvz commented 4 years ago

apm-server might need to add support for this (didn't look in detail), is there any target release?

One thing for sure we need is to know the subset of settings that RUM is going to apply.

felixbarny commented 4 years ago

apm-server might need to add support for this

Yes, I guess we need a separate endpoint for that. Agents would send their supported options on startup.

is there any target release?

It's scheduled for 7.7

One thing for sure we need is to know the subset of settings that RUM is going to apply.

IIRC, RUM is excluded for now.

hmdhk commented 4 years ago

@jalvz For RUM we don't have any plans to add more config options, so it should still be limited to what we support today (i.e. transactionSampleRate)

graphaelli commented 4 years ago

It's scheduled for 7.7

The 7.7 target was additional predefined settings. I think we should separate the agent provided configuration settings from that expansion and find the right target for that, likely post 7.7.

dgieselaar commented 4 years ago

@felixbarny I was actually thinking the agent would just send up one document, with options as an array. I can imagine one document rather than many documents helps simplify things (for instance, we can do upserts on some kind of serialized id, and users can manage their data more easily). But I might be missing a good reason to send up one document per option. Here's what I was thinking:

``` PUT apm-agent-configuration-options PUT apm-agent-configuration-options/_mapping { "properties": { "agent": { "properties": { "name": { "type": "keyword" }, "version": { "type": "keyword" }, "configuration": { "properties": { "options": { "properties": { "key": { "type": "keyword" }, "type": { "type": "keyword" }, "category": { "type": "keyword" }, "default": { "type": "keyword" }, "tags": { "type": "keyword" }, "validation": { "properties": { "min": { "type": "float" }, "max": { "type": "float" }, "negativeMatch": { "type": "boolean" }, "regex": { "type": "keyword" } } }, "description": { "type": "text" } } } } } } }, "service": { "properties": { "name": { "type": "keyword" }, "environment": { "type": "keyword" }, "node": { "properties": { "name": { "type": "keyword" } } } } }, "@timestamp": { "type": "date" } } } POST apm-agent-configuration-options/_doc { "agent": { "name": "opbeans-java", "version": "1.0", "configuration": { "options": [ { "key": "active", "type": "Boolean", "category": "Core", "default": "true", "description": "A boolean specifying if the agent should be active or not.\nWhen active, the agent instruments incoming HTTP requests, tracks errors and collects and sends metrics.\nWhen inactive, the agent works as a noop, not collecting data and not communicating with the APM sever.\nAs this is a reversible switch, agent threads are not being killed when inactivated, but they will be \nmostly idle in this state, so the overhead should be negligible.\n\nYou can use this setting to dynamically disable Elastic APM at runtime." }, { "key": "transaction_sample_rate", "type": "Double", "category": "Core", "default": "1.0", "tags": [ "performance" ], "validation": { "min": 0, "max": 1, "negativeMatch": false }, "description": "By default, the agent will sample every transaction (e.g. request to your service). To reduce overhead and storage requirements, you can set the sample rate to a value between 0.0 and 1.0. We still record overall time and the result for unsampled transactions, but no context information, labels, or spans." } ] } }, "@timestamp": "2020-02-23T20:44:44.264Z", "service": { "node": { "name": "opbeans-java-2" }, "name": "opbeans-java", "environment": "production" } } ```

We can then use a terms aggregation on agent.configuration.options.key, with a top_hits sub-aggregation, sorted by @timestamp:

GET apm-agent-configuration-options/_search
{
  "size": 0,
  "aggs": {
    "options": {
      "terms": {
        "field": "agent.configuration.options.key",
        "size": 100      
      },
      "aggs": {
        "sample_documents": {
          "top_hits": {
            "size": 1,
            "sort": {
              "@timestamp": "desc"
            }
          }
        }
      }
    }
  }
}

which would return:

Optionally we add service.name, service.version and agent.name terms queries to limit the result set. We can also use a more specific sorting algorithm for the top_hits aggregation, e.g. score documents that match more filters higher (similar to what we do when fetching agent configurations).

cc @elastic/apm-ui for anyone who has additional ideas/thoughts.

sorenlouv commented 4 years ago

Perhaps we should specify the types for each setting according to their mapping in ES?

Btw. CAPTURE_BODY default value is "off", not false afaik.

sorenlouv commented 4 years ago

Since this PR is scheduled for 7.7 I assume it refers to hardcoding the settings in the UI. Making it possible for agents to specify settings on the fly is 7.8+ and perhaps belongs in another issue.

mikker commented 4 years ago

Both API_ values are strings, correct. Here's the Ruby agent docs: https://www.elastic.co/guide/en/apm/agent/ruby/current/configuration.html#config-api-request-size

The agents are aligned on the format.

mikker commented 4 years ago

Capture_body docs: https://www.elastic.co/guide/en/apm/agent/ruby/current/configuration.html#config-capture-body (the table is missing, weirdly?)

Given that most options can be set via ENV vars they can be given as strings and will be converted (talking only for the Ruby agent, but I expect the same from other agents?)

Because of this, I think you can type them however it would make the most sense to get the UI to work right.

sorenlouv commented 4 years ago

Because of this, I think you can type them however it would make the most sense to get the UI to work right.

Would you prefer receiving API_REQUEST_TIME as 3600000 or "1h" ? Similarly with API_REQUEST_SIZE: 3072 vs "3kb"?

mikker commented 4 years ago

I think the safe option is to always include the unit, but it's fine to always be the same scale 3600000ms or whatever. The agents could potentially react differently to plain numbers and the string format is what we have aligned on.

felixbarny commented 4 years ago

Perhaps we should specify the types for each setting according to their mapping in ES?

The problem is that this doesn't specify whether something is a list or a scalar. If the UI knows something is a list, it might render the input differently. Also, as discussed, users should be able to enter time durations or byte values like 10s, 1mb and there should ideally be validation that fails when entering 1h (hours are not supported, only ms, s and m, the regex is ^(-)?(\d+)(ms|s|m)$), 1 ms (space before the unit is disallowed) or 1mib, for example (regex is ^(\d+)(b|kb|mb|gb)$). We could send up the regex in the validation rules but I think the UI should eventually know about these data types. When doing that, the validation messages could be made nicer than just stating it doesn't validate against a given regex and we could make special inputs, for example, a number input combined with a dropdown listing all the available units.

As for when which part of that should be targeted for which version, I leave that up to @sqren @graphaelli and @nehaduggal.

I think the safe option is to always include the unit

Not specifying the unit is possible with some agents but only for backwards-compatibility reasons. So always specifying the unit is the way to go to avoid ambiguities.

I was actually thinking the agent would just send up one document, with options as an array.

I seem to recall something around problems with aggregations on arrays. But I might be wrong or it may not be a problem in this case.

sorenlouv commented 4 years ago

As for when which part of that should be targeted for which version, I leave that up to @sqren @graphaelli and @nehaduggal.

It's the plan to add all of the above-mentioned settings for 7.7 (still hardcoded in the ui). Wrt the dynamic approach this may be a while out (most likely not 7.8).

beniwohli commented 4 years ago

If the settings will be hardcoded in 7.7, we probably should remove or at least somehow mark the options that are specific to the Java agent.

sorenlouv commented 4 years ago

we probably should remove or at least somehow mark the options that are specific to the Java agent.

The java settings will only be displayed where applicable. Meaning: only if the user is creating a config for a java service or has selected the "All" option (in this case all settings will be displayed).

sorenlouv commented 4 years ago

btw. I agree: would be very nice if the above table noted which of the settings are java specific

felixbarny commented 4 years ago

Is it possible to write directly into the configuration index or does the APM Server white-list certain options or expect them to be in a non-string data type?

felixbarny commented 4 years ago

Allowing that^ (in case it's currently not possible) might be enough for 7.7 (wdyt @nehaduggal ?). Then we can direct all eng resources towards making the UI dynamic for 7.8/7.9 without the "throwaway" work to statically support some more java specific options.

sorenlouv commented 4 years ago

@felixbarny

Is it possible to write directly into the configuration index or does the APM Server white-list certain options or expect them to be in a non-string data type?

Yes. Previously there was a whitelist but now I've opened up the API on the Kibana-side so any string-based key/value pair is allowed:

{
   "settings": {
     "my_custom_java_setting": "true"
   }
} 
sorenlouv commented 4 years ago

@jalvz Will this require changes on the APM Server side for the agents to be able to consume custom (non-whitelisted) options?

jalvz commented 4 years ago

From apm-server perspective it should be fine

sorenlouv commented 4 years ago

@lreuven I'm currently working on adding the new options to the UI, and have a few questions/favours to ask:

Thanks!

felixbarny commented 4 years ago

Yes. Previously there was a whitelist but now I've opened up the API on the Kibana-side so any string-based key/value pair is allowed:

Cool! How does the endpoint of that API look like? Is that a Kibana API or is it something to be written directly in the ES index? That's probably disallowed because it's a system index, right?

sorenlouv commented 4 years ago

Cool! How does the endpoint of that API look like? Is that a Kibana API or is it something to be written directly in the ES index? That's probably disallowed because it's a system index, right?

Yes, it's a Kibana API and documented here. The data is written to a system index, yes. So superusers can access it directly but normal users will have to use the API.

estolfo commented 4 years ago

Is the expected behavior for the ACTIVE config option that the agent stops itself when ACTIVE is changed from true to false in the central config?

SergeyKleyman commented 4 years ago

IIRC, in https://github.com/elastic/apm/issues/92#issuecomment-519752096 we agreed to deprecate active config and introduce enabled and recording as its replacement.

SergeyKleyman commented 4 years ago

@lreuven Having ENVIRONMENT setting via remote configuration might be somewhat of "a chicken and a egg" problem because in remote configuration protocol the backend uses ENVIRONMENT assigned to the agent to decide which configuration to return to the agent.

sorenlouv commented 4 years ago

Having ENVIRONMENT setting via remote configuration might be somewhat of "a chicken and a egg" problem because in remote configuration protocol the backend uses ENVIRONMENT assigned to the agent to decide which configuration to return to the agent. @SergeyKleyman

Agree, I assummed this was a mistake but thanks for brining it up. From the UI (and APM Serve) perspective service.name and environment are conditions that are used to target a particular agent. They cannot be changed via remote configuration.

felixbarny commented 4 years ago

I've talked to agent and Kibana devs and we still have a mismatch what's in Kibana remote config for the agents vs what the agents support.

Go

Java

RUM ✅

Node.js

In other words, only these should be included:

Python All set after this PR is merged (scheduled for 7.7): https://github.com/elastic/apm-agent-python/pull/778

.NET All set after a few config options are made dynamic (scheduled for 7.7) https://github.com/elastic/apm-agent-dotnet/issues/794

Ruby All set after this PR is merged (scheduled for 7.7): https://github.com/elastic/apm-agent-ruby/pull/741

felixbarny commented 4 years ago

@elastic/apm-agent-devs please prioritize manually testing central config via Kibana. As we are already post feature freeze, please do your tests by end of next week (April 3rd).

formgeist commented 4 years ago

@felixbarny I've created a follow up issue for removing those options for the specific agents https://github.com/elastic/kibana/issues/61821

basepi commented 4 years ago

Python support is merged. @beniwohli will do the manual testing this week.

axw commented 4 years ago

Tested with the Go agent, recording works. I realised I'm missing dynamic reloading support for a couple of other config attributes (span_frames_min_duration and stack_trace_limit), but I'll add them.

Also there's an issue with the unit selection for duration-type config: https://github.com/elastic/kibana/issues/62110

estolfo commented 4 years ago

I've done manual testing with the Ruby agent for all options except for recording. We still have to implement recording/enabled, tracked in this issue: https://github.com/elastic/apm-agent-ruby/issues/623

basepi commented 4 years ago

@estolfo recording/enabled are not required for 7.7 so I think you can check off Ruby above.

felixbarny commented 4 years ago

The recording flag is required for 7.7. Enabled can land later as it's not a dynamic config and thus not available for central config.

basepi commented 4 years ago

Well, I think we're mostly there. I remember seeing a note about only Java being ready with recording for 7.7 but now I'm not sure where that note is....maybe I made it up? Anyway, this is almost done in Python as well.

graphaelli commented 4 years ago

Closing this out as it has outlived its purpose. Let's take any outstanding tasks to new issues.