Closed crisdarocha closed 1 year ago
Pinging @elastic/integrations-platforms (Team:Platforms)
I think the underlying problem is that we use a different lib to parse metrics than Prometheus is using, this seems to cause some unexpected behaviors when the source data doesn't really follow with the format.
We may want to investigate a way to use the same code paths that Prometheus is using to collect metrcs
@ChrsMark I'm not sure this one is trivial, what's the approach you had in mind?
Hmm, yeap it might not be so easy yes. The code cannot even "unpack" the response, right? I had in mind that the error occurs after the response is unpacked and can be processed to fix this kind of issues.
Is there any plan to fix this?
Hey we plan to move to an improved parsing library so this might fix this one too: https://github.com/elastic/beats/issues/24707
I too am getting this error on metricbeat version 7.13.1 (amd64), libbeat 7.13.1 [2d80f6e99f41b65a270d61706fa98d13cfbda18d]
module/wrapper.go:259 Error fetching data for metricset prometheus.collector: unable to decode response from prometheus endpoint: decoding of metric family failed: text format parsing error in line 45: second TYPE line for metric name "_err_null_node_blackholed_packets", or TYPE reported after samples
@xuoguoto do you have a similar case with what is described in this issue's description? If so I'm afraid that there is no quick fix for this at the moment since this violated the Prometheus standard. As mentioned in previous comment these kind of issues might be resolved when/if we finally move to a new parsing library (#24707).
@ChrsMark From the exporter, here is what I see when greping for _err_null_node_blackholed_packets
:
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="0"} 0
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="1"} 250319
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="2"} 1
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="3"} 140111
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="4"} 0
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="5"} 0
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="6"} 0
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="7"} 0
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="8"} 0
Is this a problem?
Here we got this issue but with a slight variation.
unable to decode response from prometheus endpoint: decoding of metric family failed: text format parsing error in line 58: second TYPE line for metric name "jvm_classes_loaded", or TYPE reported after samples
sh-4.2# curl -s http://10.1.86.129:9779/metrics|cat -n |grep jvm_classes_loaded
54 # HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
55 # TYPE jvm_classes_loaded gauge
56 jvm_classes_loaded 28959.0
57 # HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started ex
ecution
58 # TYPE jvm_classes_loaded_total counter
59 jvm_classes_loaded_total 29166.0
@hamelg, I encountered the same issue as you. Metrics are exposed via Prometheus JMX Exporter. The weird thing is, that metricbeat behaves differnt on different versions of the JMX Exporter
With MX Exporter v 0.14.0 everything works as expected – metrics are exported, with v 0.16.1 I get the following error
2022-04-05T17:33:07.769+0200 INFO module/wrapper.go:259 Error fetching data for metricset prometheus.collector: unable to decode response from prometheus endpoint: decoding of metric family failed: text format parsing error in line 4: second TYPE line for metric name "jvm_classes_loaded", or TYPE reported after samples
Output with MX Exporter v 0.14.0
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 39039.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 39481.0
Output with MX Exporter v 0.16.1:
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 18998.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 18998.0
Hey @peterschrott ! Could you also share the returned headers in both cases if you curl
the endpoints?
A quick heads-up on this.
A Prometheus server is able to scrape metrics from an endpoint that exposes duplicated metrics. In that case both metrics are collected without and issue. I verified that the case reported in the issue's description can be handled without an issue by a Prom Server.
So for an endpoint exposing the following:
# TYPE base_gc_total_total counter
# HELP base_gc_total_total Displays the total number of collections that have occurred. This attribute lists -1 if the collection count is undefined for this collector.
base_gc_total_total{name="PS MarkSweep"} 4
# TYPE base_gc_total_total counter
# HELP base_gc_total_total Displays the total number of collections that have occurred. This attribute lists -1 if the collection count is undefined for this collector.
base_gc_total_total{name="PS Scavenge"} 34
The Prom Server will collect both metrics for example:
base_gc_total_total{instance="containerd:1338", job="duplicate-types", name="PS MarkSweep"} 4
base_gc_total_total{instance="containerd:1338", job="duplicate-types", name="PS Scavenge"} 34
So in that case with the current Metricbeat module we are not able to provide the same experience. The upgrade of the library at https://github.com/elastic/beats/pull/33865 will solve this issue.
As far as the java client exporters is concerned, I cannot say for sure what was the issue but I suspect that it has to do with https://github.com/prometheus/client_java/releases/tag/parent-0.10.0 or something similar as reported at https://github.com/elastic/beats/issues/24554. In such cases the headers need to be verified and if the endpoint is openmetrics
users are advised to use the openmetrics
module introduced with https://github.com/elastic/beats/pull/27269.
Describe the enhancement: Opening the issue for enhancement on behalf of a user.
They are collecting MicroProfile Metrics from Payara in JSON format.
Unfortunately the Payara versions 5.193.1, 5.194 and 5.201 have a bug in their MicroProfile Metrics implementation and the output contains repeated TYPE lines
This violates the standard and Metricbeat yields an error:
The request is to be able to have Metricbeat to ignore duplicate (identical) TYPE lines (convert Error to Warning) and process data nevertheless.
This bug is fixed in
Payara 5.202RC1
but the upgrade is complex and lengthy due to scale of usage.Describe a specific use case for the enhancement or feature: Allow users in "bugged" version of Payara to still use Metricbeat.
As per private discussion with @exekias and @sorantis . Opening the case to keep a register of demand.