If you are reading this on develop
branch, some of the features may not be present yet on the released versions.
Java agent to scrape and expose MBeans to Prometheus. Formerly, java-prometheus-metrics-agent.
java.*
java.lang:type=OperatingSystem|Threading
java.lang:type=GarbageCollector:LastGcInfo
matches regardless of name=
key property.foo:type=A,name=B
is equivalent to foo:name=B,type=A
.I needed something that can scrape many MBeans with a small number of rules. Writing a regex for each set of key properties (key=value,... part of MBean name) was impossibly hard, especially when key properties doesn't always have a consistent order depending on how it is constructed (because it's just a hash table).
If you don't want to run the agent now, download scriptable-jmx-exporter-1.0.0-alpha5.jar and skip to Usage.
You can quickly try out this exporter by copy-and-pasting the following snippet to your shell (or by manually running one by one).
This will download the agent jar and a default configuration file, and then start the exporter using -javaagent
option.
# Download the agent jar and a default configuration file.
curl -LO https://repo1.maven.org/maven2/net/thisptr/scriptable-jmx-exporter/1.0.0-alpha5/scriptable-jmx-exporter-1.0.0-alpha5.jar
curl -LO https://raw.githubusercontent.com/eiiches/scriptable-jmx-exporter/v1.0.0-alpha5/src/main/resources/scriptable-jmx-exporter.yaml
# Finally, run JVM with the exporter enabled.
java -javaagent:scriptable-jmx-exporter-1.0.0-alpha5.jar=@scriptable-jmx-exporter.yaml net.thisptr.jmx.exporter.tools.Pause
Now, open http://localhost:9639/metrics in your browser to see the exposed metrics.
The next step is to replace the Pause program with your favorite application that you want to monitor. Continue reading or alternatively you can check out real-world examples to learn how to customize the exporter.
Add -javaagent
option to JVM arguments.
# This starts an exporter without an explicit configuration file.
# The default configuration from src/main/resources/scriptable-jmx-exporter.yaml is used.
java -javaagent:<PATH_TO_AGENT_JAR> ...
Configurations can be passed as a javaagent argument. See Configuration section for details.
# Set configurations in JSON directly on command line (YAML is not supported here)
java -javaagent:<PATH_TO_AGENT_JAR>=<CONFIG_JSON> ...
# e.g.
# java -javaagent:scriptable-jmx-exporter-1.0.0-alpha5.jar='{"rules":[{"pattern":["com.sun.management:type=HotSpotDiagnostic:DiagnosticOptions","java.lang:type=Threading:AllThreadIds","jdk.management.jfr"],"skip":true},{"transform":"!java V1.transform(in, out, \"type\")"}]}' ...
# ---
# Load configurations from PATH_TO_CONFIG_YAML file
java -javaagent:<PATH_TO_AGENT_JAR>=@<PATH_TO_CONFIG_YAML> ...
# e.g.
# java -javaagent:scriptable-jmx-exporter-1.0.0-alpha5.jar=@/etc/foo.yaml ...
# java -javaagent:scriptable-jmx-exporter-1.0.0-alpha5.jar=@foo.yaml ...
# java -javaagent:scriptable-jmx-exporter-1.0.0-alpha5.jar=@classpath:foo.yaml ...
If multiple comma-separated configurations are specified, former configurations are overriden by (or merged with) the latter ones.
java -javaagent:<PATH_TO_AGENT_JAR>=@<PATH_TO_CONFIG_YAML>,<CONFIG_JSON> ...
# e.g.
# java -javaagent:scriptable-jmx-exporter-1.0.0-alpha5.jar=@/etc/foo.yaml,'{"server":{"bind_address":":19639"}}' ...
This section requires a basic grasp of data models used in Java Management Extensions (JMX). If you are new to this area and don't understand what ObjectName or MBean is, I strongly recommend you to read Java Management Extensions (JMX) Best Practices first.
Configurations are automatically reloaded whenever the file (<PATH_TO_CONFIG_YAML>
in the description above) is modified. This behavior cannot be turned off (at least for now).
So, whenever you need to write a new configuration, it's easier to start with a simple configuration (e.g. scriptable-jmx-exporter.yaml which is the default configuration picked when no configuration is provided on command line) and incrementally edit the configuration file while actually running your software.
If the exporter fails to load a new configuration, most likely due to configuration error, the exporter will continue to use the previous configuration. On the contrary, application startup will fail if the configuration has any errors. It's generally considered safe (in a sense that it will not interrupt running workloads) to reconfigure the exporter on a production cluster while they are running.
# You can omit `server` and `options` if you are happy with the default values
server:
bind_address: '0.0.0.0:9639' # default
options:
include_timestamp: true # Include scraping timestamp for each metrics (default).
include_type: true # Enable TYPE comments (default).
include_help: true # Enable HELP comments (default).
declarations: |
public static void foo() {
log("foo");
}
rules:
- pattern:
# Drop less useful attributes JVM exposes.
- 'com\\.sun\\.management:type=HotSpotDiagnostic:DiagnosticOptions'
- 'java\\.lang:type=Threading:AllThreadIds'
- 'jdk\\.management\\.jfr'
# Some instrumentation libraries (such as Dropwizard Metrics) expose pre-calculated rate statistics.
# Since Prometheus can calculate these values by itself, we don't need them. Skip.
- '::.*MinuteRate'
- '::MeanRate'
skip: true
# Rule for known MBeans.
- pattern: 'java\\.lang|java\\.nio|jboss\\.threads|net\\.thisptr\\.jmx\\.exporter\\.agent.*'
transform: |
V1.transform(in, out, "type");
# Default rule to cover the rest.
- transform: |
V1.transform(in, out, "type");
This YAML is mapped to Config class using Jackson data-binding and validated by Hibernate validator.
See examples directory for real-world examples.
Key | Default | Description |
---|---|---|
server.bind_address |
0.0.0.0:9639 |
IP and port to listen and servce metrics on. |
Key | Default | Description |
---|---|---|
options.include_timestamp |
true |
Specifies whether /metrics response should include a timestamp at which the metric is scraped. |
options.include_help |
true |
Enables HELP comment. |
options.include_type |
true |
Enables TYPE comment. |
options.minimum_response_time |
0 |
A minimum time in milliseconds which every /metrics requests should take. This is used to avoid CPU spikes when there are thousands of metrics. When set, options.include_timestamp should not be disabled because the time at which a response completes differs from the time at which the metrics are scraped. |
These options can be overridden by URL parameters. E.g. /metrics?minimum_response_time=1000
.
You can define static classes and methods for use in transform scripts, condition expressions, etc. They will be automatically imported and available so you don't have to manually write import
statements.
Make sure to add public static
in the declarations; otherwise, the classes and methods won't be accessible.
declarations: |
import java.util.Map;
public static void foo() {
log("foo");
}
public static class Foo {
// ...
}
Rules are searched in order and a first match is used for each attribute.
Key | Default | Description |
---|---|---|
rules[].pattern |
null |
A pattern used to match MBean attributes this rule applies to. A rule with a null pattern applies to any attributes. See Pattern Matching for syntax details. |
rules[].condition |
true |
If an expression is set, this rule is used only when the expression evaluates to true. This is useful if you want to match an MBean attribute other than by its name, such as by its class name, etc. See Condition Expression for details. |
rules[].skip |
false |
If true , skip exposition of the attribute to Prometheus. |
rules[].transform |
V1.transform(in, out, "type") |
A script to convert an MBean attribute to Prometheus metrics. See Scripting for details. |
You can use pattern matching to efficiently filter MBeans and their attributes. The general syntax is as follows:
DOMAIN_REGEX
:
KEY_REGEX_1=
VALUE_REGEX_1,
...:
ATTRIBUTE_REGEX
The last two parts separated by :
can be omitted.
Examples
jdk\.management\.jfr
matches all MBeans within jdk.management.jfr
domain.kafka:type=kafka\.Log4jController
matches MBeans within kafka
domain that have type=kafka.Log4jController
key property. Additional key properties are simply ignored and don't affect the match result.kafka.*::MeanRate|.*MinuteRate
matches if the attribute name matches MeanRate|.*MinuteRate
and the domain name matches kafka.*
.java.lang:type=Threading:AllThreadIds
Notes
:
, ,
, and =
inside regex need to be escaped with \
. They are special characters that constructs the pattern.domain:name=foo
matches an ObjectName domain:name=\"foo\"
.Just like normal regex, named capturing groups can be used to extract substrings from ObjectName and attribute names.
Example:
The captured groups are made available as match
object (Map<String, String>
) inside transform scripts.
Condition expression, if set, further narrows down MBean attributes that the rule applies to, in addition to pattern
.
If the condition evaluates to false
, the MBean attribute will be handled by one of the subsequent rules (or the default rule if there's none).
The following variables are accessible from a condition expression.
Variable Name | Type | Description |
---|---|---|
mbeanInfo |
javax.management.MBeanInfo | MBean information |
attributeInfo |
javax.management.MBeanAttributeInfo | MBean attribute information |
mbeanInfo.getClassName().endsWith("JmxReporter$Timer")
In this section, we mainly talk about transform scripts for use in rules[].transform
.
Scripts can explicitly specify which scripting engine to use, by starting a script with !<NAME>
directive. Currently, !java
is the default (and only) engine and hence can be omitted.
There used to be !jq
engine, but removed.
Java scripting is powered by Janino, which is a super-small, super-fast Java compiler.
- transform: |
!java
V1.transform(in, out, "type");
Two variables, in
(type: AttributeValue) and out
(type: MetricValueOutput) is provided.
What the script has to do is to, transform in
, which is a value (and metadata) of MBean attribute, into a MetricValue object and call out.emit(...)
with the metric object.
Implementing the transformation from scratch is not easy. So, we provide V1, a set of generic helper functions. In most cases, doing one of the following is sufficient to achieve the desired output.
V1.transform(...)
in
before calling V1.transform(...)
out
by anonymous inner class to modify V1.transform(...)
outputNOTE: We DO NOT recommend any case-style conversions.
While [Cc]amelCase with _
in-between looks somewhat unpleasant, it conveys more information from the original ObjectName,
probably making it easier to track a Prometheus metric back to the corresponding MBean attribute later when debugging, etc.
You can covert case-styles of metric name by using V1.snakeCase()
or V1.lowerCase()
.
java.lang:type=ClassLoading:LoadedClassCount
Transform Script | Prometheus Metric Example |
---|---|
V1.transform(in, out, "type") |
java_lang_ClassLoading_LoadedClassCount |
V1.transform(in, out, "type", V1.snakeCase()) |
java_lang_class_loading_loaded_class_count |
V1.transform(in, out, "type", V1.lowerCase()) |
java_lang_classloading_loadedclasscount |
For most of the applications, this rule covers most of the MBean attributes.
- transform: |
!java
V1.transform(in, out, "type");
java_nio_BufferPool_Count{name="direct",} 8 1596881052752
java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_value_committed{name="G1 Young Generation",key="CodeHeap 'profiled nmethods'",} 2752512 1596881052753
java_lang_Memory_HeapMemoryUsage_committed 1061158912 1596881052757
...
- pattern:
- java.lang:type=Runtime:VmVersion
- java.lang:type=OperatingSystem:Version
transform: |
!java
import java.util.HashMap;
MetricValue m = new MetricValue();
m.name = in.domain + "_" + in.keyProperties.get("type") + "_" + in.attributeName + "_info";
m.labels = new HashMap<>();
m.labels.put("version", (String) in.value);
m.value = 1.0;
m.timestamp = in.timestamp;
out.emit(m);
java_lang_Runtime_VmVersion_info{version="14.0.1+7",} 1.0 1595167009825
java_lang_OperatingSystem_Version_info{version="5.7.4-arch1-1",} 1.0 1595167009828
Reference: Exposing the software version to Prometheus
- pattern: java.lang:type=Threading:AllThreadIds
transform: |
!java
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadInfo;
import java.lang.management.ThreadMXBean;
import java.util.HashMap;
Thread.State[] states = Thread.State.values();
int[] counts = new int[states.length];
long timestamp = System.currentTimeMillis();
ThreadInfo[] threads = ManagementFactory.getThreadMXBean().getThreadInfo((long[]) in.value, 0);
for (ThreadInfo thread : threads) {
if (thread == null)
continue;
++counts[thread.getThreadState().ordinal()];
}
for (int i = 0; i < states.length; ++i) {
MetricValue m = new MetricValue();
m.name = "java_lang_threading_state_count";
m.labels = new HashMap<>();
m.labels.put("state", states[i].name());
m.value = counts[i];
m.timestamp = timestamp;
m.type = "gauge";
m.help = "The number of threads in the state. This value is calculated from ThreadMXBean#getThreadInfo(java.lang:type=Threading:AllThreadIds).";
out.emit(m);
}
# HELP java_lang_threading_state_count The number of threads in the state. This value is calculated from ThreadMXBean#getThreadInfo(java.lang:type=Threading:AllThreadIds).
# TYPE java_lang_threading_state_count gauge
java_lang_threading_state_count{state="NEW",} 0.0 1595170784228
java_lang_threading_state_count{state="RUNNABLE",} 11.0 1595170784228
java_lang_threading_state_count{state="BLOCKED",} 0.0 1595170784228
java_lang_threading_state_count{state="WAITING",} 1.0 1595170784228
java_lang_threading_state_count{state="TIMED_WAITING",} 3.0 1595170784228
java_lang_threading_state_count{state="TERMINATED",} 0.0 1595170784228
This is just for demonstration purpose and highly discouraged in practice unless absolutely necessary because these kind of metrics computation makes it hard to trace a metric back to its source and how the value is generated. In most cases, we don't have to do this at all, because Prometheus can perform complex query including arithmetic.
- pattern: 'java\\.lang:type=OperatingSystem:OpenFileDescriptorCount'
transform: |
!java
import java.lang.management.ManagementFactory; // imports must come first
import javax.management.ObjectName;
V1.transform(in, out, "type", V1.gauge()); // emit raw metric
// modify name and values and emit computed metric
long max = (Long) ManagementFactory.getPlatformMBeanServer().getAttribute(new ObjectName("java.lang:type=OperatingSystem"), "MaxFileDescriptorCount");
in.value = max - (Long) in.value;
in.attributeName = "AvailableFileDescriptorCount";
in.attributeDescription = "The number of file descriptors available to be opened in this JVM, which is calculated as java.lang:type=OperatingSystem:MaxFileDescriptorCount - java.lang:type=OperatingSystem:OpenFileDescriptorCount.";
V1.transform(in, out, "type", V1.gauge());
# HELP java_lang_OperatingSystem_OpenFileDescriptorCount OpenFileDescriptorCount
# TYPE java_lang_OperatingSystem_OpenFileDescriptorCount gauge
java_lang_OperatingSystem_OpenFileDescriptorCount 29 1596880934872
# HELP java_lang_OperatingSystem_AvailableFileDescriptorCount The number of file descriptors available to be opened in this JVM, which is calculated as java.lang:type=OperatingSystem:MaxFileDescriptorCount - java.lang:type=OperatingSystem:OpenFileDescriptorCount.
# TYPE java_lang_OperatingSystem_AvailableFileDescriptorCount gauge
java_lang_OperatingSystem_AvailableFileDescriptorCount 1048547 1596880934872
While this exporter does not support OpenMetrics yet, you can prepare for the upcoming OpenMetrics support.
The most notable difference between the Prometheus format and the OpenMetrics format is that OpenMetrics requires (not only recommends) _total
suffix for counter metrics.
To ensure conformance to both formats in future, set total
suffix to counter
metrics. E.g.
MetricValue m = new MetricValue();
m.name = "<NAME>"
m.suffix = "total";
m.type = "counter";
m.value = 1.0;
out.emit(m);
This will produce the following responses in respective formats:
Prometheus (Special-cased to append _total
to metric name in annotations, when counter
has total
suffix)
# TYPE <NAME>_total counter
<NAME>_total 1.0
OpenMetrics
# TYPE <NAME> counter
<NAME>_total 1.0
All that said, if you prefer to leave the metrics untyped
to keep configurations simple, that should be also fine.
if
inside scripts. It's usually faster.static
inside method-local inner class to do things that need to be done once, such as to compile a regex. Note that this should not be used to share mutable states because transform scripts are executed concurrently.
transform: |
class Holder {
public static final Pattern PATTERN = Pattern.compile(".*");
}
log(Holder.PATTERN.matcher("foo").matches());
Sometimes it's hard to debug complex transform
scripts. Here are some tips and tricks to debug them.
This exporter uses JUL framework for logging. Errors caused by user configurations are logged at >= INFO level. Other errors are logged at < INFO level. If you are encountering issues, consider setting log levels to FINEST to see detailed logs.
To change a log level, create logging.properties
and set java.util.logging.config.file
system property to point to the file.
$ cat logging.properties
handlers = java.util.logging.ConsoleHandler
.level = INFO
java.util.logging.ConsoleHandler.level = FINEST
java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter
java.util.logging.SimpleFormatter.format = %1$tFT%1$tT.%1$tL %4$-5.5s %3$-80.80s : %5$s%6$s%n
net.thisptr.jmx.exporter.agent.level = FINEST
net.thisptr.jmx.exporter.agent.shade.level = INFO
$ java -Djava.util.logging.config.file=logging.properties ...
If you are using jul-to-slf4j or log4j-jul to redirect JUL logs to another backend (such as log4j2, logback, ...), this may not work. Please consult the relevant documentations for your logging framework.
If you are writing transform scripts in Java, you can use log(fmt, ...)
or log(obj)
method. The logs will be recorded at INFO level.
- pattern: 'java.lang'
transform: |
!java
log(in);
log("hi");
log("test: %s", in.value); # printf style; see javadoc for String.format()
V1.transform(in, out, "type");
Alternatively, you can also use System.out.printf(...)
or System.err.printf(...)
as in any other programs.
First, it's almost impossible to do a fair comparison. The responses are not the same. Even the number of metrics is not the same. Please also keep in mind that performance is highly dependent on the configurations and these numbers are very specific to the configurations we used for this benchmark.
See examples/benchmark-kafka for the setup details. Here are the results:
Exporter | Config File (# of lines) | # of Metrics (*1) | Throughput [req/s] | Avg. Latency [ms] @ 10 [req/s] |
---|---|---|---|---|
scriptable-jmx-exporter | scriptable-jmx-exporter.yaml (54) | 3362 | 939.45 | TBD |
jmx_exporter 0.13.0 | kafka-2_0_0.yml (103) | 3157 | 12.14 | TBD |
(*) Benchmarked on Intel Core i5-9600K (with Turbo Boost disabled), Linux 5.7.4. (*1) kafka-2_0_0.yml seems to be missing a number of metrics, such as kafka.server:type=socket-server-metrics
.
We excluded such metrics as well. The difference in the number of metrics mostly comes from how we treat JVM metrics.
NOTE: This benchmark result is quite old. I am aware of many performance improvement efforts in recent jmx_exporter and this benchmark has to be updated when I have time.
The MIT License.