apache / solr-operator

Official Kubernetes operator for Apache Solr
https://solr.apache.org/operator
Apache License 2.0
242 stars 112 forks source link

Solr 8.11 with SolrMetrics produces duplicate samples with prometheus v2.52 #705

Closed perosb closed 2 weeks ago

perosb commented 1 month ago

Solr 8.11 seem to be producing duplicate metrics which is flooding the logs and triggering alerts.

This is only when using latest prometheus v2.52.0. Related https://github.com/prometheus/prometheus/issues/14089 This doesn't seem to happen with Solr 9. It still happens when running solr 8.11 and solr-exporter image tag 9.x

This is the debug log with 1 collection and the related duplicates:

{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_query_requests_total{category=\"QUERY\",searchHandler=\"/select\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.909Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_query_mean_rate{category=\"QUERY\",searchHandler=\"/select\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.909Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_query_mean_rate{category=\"QUERY\",searchHandler=\"/select\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.909Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_query_5minRate{category=\"QUERY\",searchHandler=\"/select\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.909Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_query_1minRate{category=\"QUERY\",searchHandler=\"/select\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.909Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_query_p75_ms{category=\"QUERY\",searchHandler=\"/select\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.909Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_query_p75_ms{category=\"QUERY\",searchHandler=\"/select\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.909Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_query_median_ms{category=\"QUERY\",searchHandler=\"/select\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.909Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_query_median_ms{category=\"QUERY\",searchHandler=\"/select\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.909Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_requests_total{category=\"QUERY\",handler=\"/select\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.909Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_requests_total{category=\"ADMIN\",handler=\"/admin/ping\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.909Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_query_p99_ms{category=\"QUERY\",searchHandler=\"/select\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.910Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_query_p99_ms{category=\"QUERY\",searchHandler=\"/select\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.910Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_query_p95_ms{category=\"QUERY\",searchHandler=\"/select\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.910Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_query_p95_ms{category=\"QUERY\",searchHandler=\"/select\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.910Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_node_requests_total{category=\"ADMIN\",handler=\"/admin/info\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.910Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_node_requests_total{category=\"ADMIN\",handler=\"/admin/metrics\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.910Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_node_time_seconds_total{category=\"ADMIN\",handler=\"/admin/info\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.910Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_node_time_seconds_total{category=\"ADMIN\",handler=\"/admin/metrics\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.910Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_time_seconds_total{category=\"QUERY\",handler=\"/select\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.910Z"}
{"caller":"scrape.go:1777","component":"scrape manager","level":"debug","msg":"Duplicate sample for timestamp","scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","series":"solr_metrics_core_time_seconds_total{category=\"ADMIN\",handler=\"/admin/ping\",internal=\"false\",core=\"kermit_shard1_replica_n1\",collection=\"kermit\",shard=\"shard1\",replica=\"replica_n1\",base_url=\"http://frog-solrcloud-1.frog-solrcloud-headless.solr-test:8983/solr\",cluster_id=\"f7122eee5d\"}","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.910Z"}
{"caller":"scrape.go:1738","component":"scrape manager","level":"warn","msg":"Error on ingesting samples with different value but same timestamp","num_dropped":21,"scrape_pool":"serviceMonitor/solr-test/solr-metrics/0","target":"http://172.23.50.53:8080/metrics","ts":"2024-05-15T08:32:26.911Z"}

Disclaimer: this probably is a Solr bug rather than operator?

HoustonPutman commented 3 weeks ago

Yeah, I doubt this is a solr operator bug. But have you tried running the same version of Solr and the Solr Prometheus exporter?

whereismyjetpack commented 3 weeks ago

I believe this is an issue with the solr-prometheus-exporter. I've tried with the same, and latest version. it seems to be duplicating, at least in my case, the /admin/ping handler.

solr_metrics_core_time_seconds_total{category="ADMIN",handler="/admin/ping",core="my-cool-core",collection="my-cool-collection",shard="shard1",replica="replica_n1",base_url="http://solr-qa-solrcloud-0.solr-qa-solrcloud-headless.solr-qa:8983/solr",} 6.4896774849E7
solr_metrics_core_time_seconds_total{category="ADMIN",handler="/admin/ping",core="my-cool-core",collection="my-cool-collection",shard="shard1",replica="replica_n1",base_url="http://solr-qa-solrcloud-0.solr-qa-solrcloud-headless.solr-qa:8983/solr",} 0.0
HoustonPutman commented 3 weeks ago

that is really strange. Maybe put something on the solr users list and see if anyone has help?

mlbiscoc commented 3 weeks ago

Do you have a sample of what prometheus-exporter config file is being used?

whereismyjetpack commented 3 weeks ago
cat /opt/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml
<?xml version="1.0" encoding="UTF-8" ?>
<!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->

<config>

  <!--
  Templates to help reduce jq boilerplate used by many metrics in this config;
  mainly intended for metrics that don't require a bunch of jq magic to work and are mostly boilerplate.

  A regex with named groups is used to match template references to template + vars using the basic pattern:

      $jq:<TEMPLATE>( <UNIQUE>, <KEYSELECTOR>, <METRIC>, <TYPE> )

  For instance,

      $jq:core(requests_total, endswith(".requestTimes"), count, COUNTER)

  TEMPLATE = core
  UNIQUE = requests_total (unique suffix for this metric, results in a metric named "solr_metrics_core_requests_total")
  KEYSELECTOR = endswith(".requestTimes") (filter to select the specific key for this metric)
  METRIC = count
  TYPE = COUNTER

  Some templates may have a default type, so you can omit that from your template reference, such as:

      $jq:core(requests_total, endswith(".requestTimes"), count)

  Uses the defaultType=COUNTER as many uses of the core template are counts.

  If a template reference omits the metric, then the unique suffix is used, for instance:

      $jq:core-query(1minRate, endswith(".distrib.requestTimes"))

  Creates a GAUGE metric (default type) named "solr_metrics_core_query_1minRate" using the 1minRate value from the selected JSON object.

  Add templates as needed, three metrics using the same structure feels about right as the threshold for creating a new template.
  -->
  <jq-templates>
    <template name="core-query" defaultType="GAUGE">
      .metrics | to_entries | .[] | select(.key | startswith("solr.core.")) as $parent |
      $parent.key | split(".") as $parent_key_items |
      $parent_key_items | length as $parent_key_item_len |
      (if $parent_key_item_len == 3 then $parent_key_items[2] else "" end) as $core |
      (if $parent_key_item_len == 5 then $parent_key_items[2] else "" end) as $collection |
      (if $parent_key_item_len == 5 then $parent_key_items[3] else "" end) as $shard |
      (if $parent_key_item_len == 5 then $parent_key_items[4] else "" end) as $replica |
      (if $parent_key_item_len == 5 then ($collection + "_" + $shard + "_" + $replica) else $core end) as $core |
      $parent.value | to_entries | .[] | {KEYSELECTOR} | select (.value | type == "object") as $object |
      $object.key | split(".")[0] as $category |
      $object.key | split(".")[1] as $handler |
      select($category | startswith("QUERY")) |
      select($handler | startswith("/")) |
      {METRIC} as $value |
      if $parent_key_item_len == 3 then
      {
      name: "solr_metrics_core_query_{UNIQUE}",
      type: "{TYPE}",
      help: "See: https://lucene.apache.org/solr/guide/performance-statistics-reference.html",
      label_names: ["category", "searchHandler", "core"],
      label_values: [$category, $handler, $core],
      value: $value
      }
      else
      {
      name: "solr_metrics_core_query_{UNIQUE}",
      type: "{TYPE}",
      help: "See: https://lucene.apache.org/solr/guide/performance-statistics-reference.html",
      label_names: ["category", "searchHandler", "core", "collection", "shard", "replica"],
      label_values: [$category, $handler, $core, $collection, $shard, $replica],
      value: $value
      }
      end
    </template>
    <template name="core" defaultType="COUNTER">
      .metrics | to_entries | .[] | select(.key | startswith("solr.core.")) as $parent |
      $parent.key | split(".") as $parent_key_items |
      $parent_key_items | length as $parent_key_item_len |
      (if $parent_key_item_len == 3 then $parent_key_items[2] else "" end) as $core |
      (if $parent_key_item_len == 5 then $parent_key_items[2] else "" end) as $collection |
      (if $parent_key_item_len == 5 then $parent_key_items[3] else "" end) as $shard |
      (if $parent_key_item_len == 5 then $parent_key_items[4] else "" end) as $replica |
      (if $parent_key_item_len == 5 then ($collection + "_" + $shard + "_" + $replica) else $core end) as $core |
      $parent.value | to_entries | .[] | {KEYSELECTOR} as $object |
      $object.key | split(".")[0] as $category |
      $object.key | split(".")[1] as $handler |
      select($handler | startswith("/")) |
      {METRIC} as $value |
      if $parent_key_item_len == 3 then
      {
      name: "solr_metrics_core_{UNIQUE}",
      type: "{TYPE}",
      help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
      label_names: ["category", "handler", "core"],
      label_values: [$category, $handler, $core],
      value: $value
      }
      else
      {
      name: "solr_metrics_core_{UNIQUE}",
      type: "{TYPE}",
      help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
      label_names: ["category", "handler", "core", "collection", "shard", "replica"],
      label_values: [$category, $handler, $core, $collection, $shard, $replica],
      value: $value
      }
      end
    </template>
    <template name="update-handler" defaultType="COUNTER">
      .metrics | to_entries | .[] | select(.key | startswith("solr.core.")) as $parent |
      $parent.key | split(".") as $parent_key_items |
      $parent_key_items | length as $parent_key_item_len |
      (if $parent_key_item_len == 3 then $parent_key_items[2] else "" end) as $core |
      (if $parent_key_item_len == 5 then $parent_key_items[2] else "" end) as $collection |
      (if $parent_key_item_len == 5 then $parent_key_items[3] else "" end) as $shard |
      (if $parent_key_item_len == 5 then $parent_key_items[4] else "" end) as $replica |
      (if $parent_key_item_len == 5 then ($collection + "_" + $shard + "_" + $replica) else $core end) as $core |
      $parent.value | to_entries | .[] | {KEYSELECTOR} as $object |
      $object.key | split(".")[0] as $category |
      $object.key | split(".")[1] as $handler |
      {METRIC} as $value |
      if $parent_key_item_len == 3 then
      {
      name: "solr_metrics_core_{UNIQUE}",
      type: "{TYPE}",
      help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
      label_names: ["category", "handler", "core"],
      label_values: [$category, $handler, $core],
      value: $value
      }
      else
      {
      name: "solr_metrics_core_{UNIQUE}",
      type: "{TYPE}",
      help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
      label_names: ["category", "handler", "core", "collection", "shard", "replica"],
      label_values: [$category, $handler, $core, $collection, $shard, $replica],
      value: $value
      }
      end
    </template>
    <template name="node" defaultType="COUNTER">
      .metrics["solr.node"] | to_entries | .[] | {KEYSELECTOR} as $object |
      $object.key | split(".")[0] as $category |
      $object.key | split(".")[1] as $handler |
      {METRIC} as $value |
      {
      name         : "solr_metrics_node_{UNIQUE}",
      type         : "{TYPE}",
      help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
      label_names  : ["category", "handler"],
      label_values : [$category, $handler],
      value        : $value
      }
    </template>
    <template name="cache-searcher" defaultType="GAUGE">
      .metrics | to_entries | .[] | select(.key | startswith("solr.core.")) as $parent |
      $parent.key | split(".") as $parent_key_items |
      $parent_key_items | length as $parent_key_item_len |
      (if $parent_key_item_len == 3 then $parent_key_items[2] else "" end) as $core |
      (if $parent_key_item_len == 5 then $parent_key_items[2] else "" end) as $collection |
      (if $parent_key_item_len == 5 then $parent_key_items[3] else "" end) as $shard |
      (if $parent_key_item_len == 5 then $parent_key_items[4] else "" end) as $replica |
      (if $parent_key_item_len == 5 then ($collection + "_" + $shard + "_" + $replica) else $core end) as $core |
      $parent.value | to_entries | .[] | select(.key | startswith("CACHE.searcher.")) | select (.key | endswith("documentCache") or endswith("fieldValueCache") or endswith("filterCache") or endswith("perSegFilter") or endswith("queryResultCache")) as $object |
      $object.key | split(".")[0] as $category |
      $object.key | split(".")[2] as $type |
      $object.value | to_entries | .[] | {KEYSELECTOR} as $target |
      $target.key as $item |
      {METRIC} as $value |
      if $parent_key_item_len == 3 then
      {
      name: "solr_metrics_core_searcher_{UNIQUE}",
      type: "{TYPE}",
      help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
      label_names: ["category", "core", "type", "item"],
      label_values: [$category, $core, $type, $item],
      value: $value
      }
      else
      {
      name: "solr_metrics_core_searcher_{UNIQUE}",
      type: "{TYPE}",
      help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
      label_names: ["category", "core", "collection", "shard", "replica", "type", "item"],
      label_values: [$category, $core, $collection, $shard, $replica, $type, $item],
      value: $value
      }
      end      
    </template>
    <template name="node-thread-pool" defaultType="COUNTER">
      .metrics["solr.node"] | to_entries | .[] | select(.key | contains(".threadPool.")) | {KEYSELECTOR} as $object |
      $object.key | split(".") as $key_items |
      $key_items | length as $label_len |
      $key_items[0] as $category |
      (if $label_len >= 5 then $key_items[1] else "" end) as $handler |
      (if $label_len >= 5 then $key_items[3] else $key_items[2] end) as $executor |
      {METRIC} as $value |
      {
      name         : "solr_metrics_node_thread_pool_{UNIQUE}",
      type         : "{TYPE}",
      help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
      label_names  : ["category", "handler", "executor"],
      label_values : [$category, $handler, $executor],
      value        : $value
      }
    </template>
    <template name="jvm-item" defaultType="GAUGE">
      .metrics["solr.jvm"] | to_entries | .[] | {KEYSELECTOR} as $object |
      $object.key | split(".") | last as $item |
      {METRIC} as $value |
      {
      name         : "solr_metrics_jvm_{UNIQUE}",
      type         : "{TYPE}",
      help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
      label_names  : ["item"],
      label_values : [$item],
      value        : $value
      }
    </template>
  </jq-templates>

  <rules>

    <ping>
      <lst name="request">
        <lst name="query">
          <str name="path">/admin/ping</str>
        </lst>
        <arr name="jsonQueries">
          <str>
            . as $object | $object |
            (if $object.status == "OK" then 1.0 else 0.0 end) as $value |
            {
              name         : "solr_ping",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/ping.html",
              label_names  : [],
              label_values : [],
              value        : $value
            }
          </str>
        </arr>
      </lst>
    </ping>

    <metrics>
      <lst name="request">
        <lst name="query">
          <str name="path">/admin/metrics</str>
          <lst name="params">
            <!--
              trim some of these expressions as needed if you don't care about
              a particular group of metrics.
            -->
            <str name="expr">solr\.jetty:.*DefaultHandler.*</str>
            <str name="expr">solr\.jvm:.*</str>
            <str name="expr">solr\.node:.*</str>
            <str name="expr">solr\.overseer:.*</str>
            <str name="expr">solr\.core\..*:QUERY\..*</str>
            <str name="expr">solr\.core\..*:ADMIN\..*</str>
            <str name="expr">solr\.core\..*:CACHE\..*</str>
            <str name="expr">solr\.core\..*:UPDATE\.updateHandler\..*</str>
            <str name="expr">solr\.core\..*:CORE\.fs\..*</str>
            <str name="expr">solr\.core\..*:HIGHLIGHTER\..*</str>
            <str name="expr">solr\.core\..*:INDEX\..*</str>
            <str name="expr">solr\.core\..*:REPLICATION\.replication\..*</str>
            <str name="expr">solr\.core\..*:SEARCHER\.searcher\..*</str>

            <!-- Alternative expressions, which are much stricter but still provide
            enough data to populate the default dashboard.
            These expressions omit many unused properties of the complex metrics,
            and also skip whole groups of rarely used metrics: core ADMIN, REPLICATION,
            HIGHLIGHTER, and selects only the most common QUERY handlers.

            In order to use these expressions remove the default list of expressions
            above and the START / END lines below. -->

            <!-- === START ===

            <str name="expr">solr\.jetty:.*\.DefaultHandler\.(dispatches|.*-requests|.*xx-responses):count</str>

            <str name="expr">solr\.jvm:(buffers|gc).*</str>
            <str name="expr">solr\.jvm:memory\.(heap|non-heap|pools)\.*\.usage</str>
            <str name="expr">solr\.jvm:memory\.total</str>
            <str name="expr">solr\.jvm:os\..*(FileDescriptorCount|Load.*|Size|processCpuTime)</str>
            <str name="expr">solr\.jvm:threads\..*count</str>

            <str name="expr">solr\.node:CONTAINER\.(cores|fs).*</str>

            <str name="expr">solr\.core\..*:CORE\.fs\..*Space</str>
            <str name="expr">solr\.core\..*:INDEX\.sizeInBytes</str>
            <str name="expr">solr\.core\..*:QUERY\./(select|get|export|stream|query|graph|sql)\..*requestTimes:(count|1minRate|5minRate|median_ms|meanRate|p75_ms|p95_ms|p99_ms)</str>
            <str name="expr">solr\.core\..*:QUERY\./(select|get|export|stream|query|graph|sql)\.totalTime</str>
            <str name="expr">solr\.core\..*:QUERY\./(select|get|export|stream|query|graph|sql)\..*rrors:(count!1minRate)</str>
            <str name="expr">solr\.core\..*:SEARCHER\.searcher\..*Doc.*</str>
            <str name="expr">solr\.core\..*:UPDATE\.updateHandler\..*</str>
            <str name="expr">solr\core\..*:CACHE\..*</str>

            === END === -->

          </lst>
        </lst>
        <arr name="jsonQueries">
          <!--
            jetty metrics
          -->
          <str>
            .metrics["solr.jetty"] | to_entries | .[] | select(.key | startswith("org.eclipse.jetty.server.handler.DefaultHandler")) | select(.key | endswith("xx-responses")) as $object |
            $object.key | split(".") | last | split("-") | first as $status |
            $object.value.count as $value |
            {
            name         : "solr_metrics_jetty_response_total",
            type         : "COUNTER",
            help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
            label_names  : ["status"],
            label_values : [$status],
            value        : $value
            }
          </str>
          <str>
            .metrics["solr.jetty"] | to_entries | .[] | select(.key | startswith("org.eclipse.jetty.server.handler.DefaultHandler.")) | select(.key | endswith("-requests")) | select (.value | type == "object") as $object |
            $object.key | split(".") | last | split("-") | first as $method |
            $object.value.count as $value |
            {
              name         : "solr_metrics_jetty_requests_total",
              type         : "COUNTER",
              help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names  : ["method"],
              label_values : [$method],
              value        : $value
            }
          </str>
          <str>
            .metrics["solr.jetty"] | to_entries | .[] | select(.key == "org.eclipse.jetty.server.handler.DefaultHandler.dispatches") as $object |
            $object.value.count as $value |
            {
              name         : "solr_metrics_jetty_dispatches_total",
              type         : "COUNTER",
              help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names  : [],
              label_values : [],
              value        : $value
            }
          </str>
          <!--
            jvm metrics
          -->
          <str>
            .metrics["solr.jvm"] | to_entries | .[] | select(.key | startswith("buffers.")) | select(.key | endswith(".Count")) as $object |
            $object.key | split(".")[1] as $pool |
            $object.value as $value |
            {
              name         : "solr_metrics_jvm_buffers",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names  : ["pool"],
              label_values : [$pool],
              value        : $value
            }
          </str>
          <str>
            .metrics["solr.jvm"] | to_entries | .[] | select(.key | startswith("buffers.")) | select(.key | (endswith(".MemoryUsed") or endswith(".TotalCapacity"))) as $object |
            $object.key | split(".")[1] as $pool |
            $object.key | split(".") | last as $item |
            $object.value as $value |
            {
              name         : "solr_metrics_jvm_buffers_bytes",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names  : ["pool", "item"],
              label_values : [$pool, $item],
              value        : $value
            }
          </str>
          <str>
            .metrics["solr.jvm"] | to_entries | .[] | select(.key | startswith("gc.")) | select(.key | endswith(".count")) as $object |
            $object.key | split(".")[1] as $item |
            $object.value as $value |
            {
              name         : "solr_metrics_jvm_gc_total",
              type         : "COUNTER",
              help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names  : ["item"],
              label_values : [$item],
              value        : $value
            }
          </str>
          <str>
            .metrics["solr.jvm"] | to_entries | .[] | select(.key | startswith("gc.")) | select(.key | endswith(".time")) as $object |
            $object.key | split(".")[1] as $item |
            ($object.value / 1000) as $value |
            {
              name         : "solr_metrics_jvm_gc_seconds_total",
              type         : "COUNTER",
              help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names  : ["item"],
              label_values : [$item],
              value        : $value
            }
          </str>
          <str>
            $jq:jvm-item(memory_heap_bytes,
                         select(.key | startswith("memory.heap.")) | select(.key | endswith(".usage") | not),
                         object.value)
          </str>
          <str>
            $jq:jvm-item(memory_non_heap_bytes,
                         select(.key | startswith("memory.non-heap.")) | select(.key | endswith(".usage") | not),
                         object.value)
          </str>
          <str>
            .metrics["solr.jvm"] | to_entries | .[] | select(.key | startswith("memory.pools.")) | select(.key | endswith(".usage") | not) as $object |
            $object.key | split(".")[2] as $space |
            $object.key | split(".") | last as $item |
            $object.value as $value |
            {
              name         : "solr_metrics_jvm_memory_pools_bytes",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names  : ["space", "item"],
              label_values : [$space, $item],
              value        : $value
            }
          </str>
          <str>
            $jq:jvm-item(memory_bytes, select(.key | startswith("memory.total.")), object.value)
          </str>
          <str>
            $jq:jvm-item(os_memory_bytes,
                         select(.key == "os.committedVirtualMemorySize" or .key == "os.freePhysicalMemorySize" or .key == "os.freeSwapSpaceSize" or .key =="os.totalPhysicalMemorySize" or .key == "os.totalSwapSpaceSize"),
                         object.value)
          </str>
          <str>
            $jq:jvm-item(os_file_descriptors, select(.key == "os.maxFileDescriptorCount" or .key == "os.openFileDescriptorCount"), object.value)
          </str>
          <str>
            $jq:jvm-item(os_cpu_load, select(.key == "os.processCpuLoad" or .key == "os.systemCpuLoad"), object.value)
          </str>
          <str>
            .metrics["solr.jvm"] | to_entries | .[] | select(.key == "os.processCpuTime") as $object |
            ($object.value / 1000.0) as $value |
            {
              name         : "solr_metrics_jvm_os_cpu_time_seconds",
              type         : "COUNTER",
              help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names  : ["item"],
              label_values : ["processCpuTime"],
              value        : $value
            }
          </str>
          <str>
            .metrics["solr.jvm"] | to_entries | .[] | select(.key == "os.systemLoadAverage") as $object |
            $object.value as $value |
            {
              name         : "solr_metrics_jvm_os_load_average",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names  : ["item"],
              label_values : ["systemLoadAverage"],
              value        : $value
            }
          </str>
          <str>
            .metrics["solr.jvm"] | to_entries | .[] | select(.key | startswith("threads.")) | select(.key | endswith(".count")) as $object |
            $object.key | split(".")[1] as $item |
            $object.value as $value |
            {
              name         : "solr_metrics_jvm_threads",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names  : ["item"],
              label_values : [$item],
              value        : $value
            }
          </str>
          <!--
            overseer metrics
          -->
          <str>
            .metrics | to_entries | .[] | select(.key | startswith("solr.overseer")) as $object |
            $object.value as $value | $value | to_entries | .[]  |
            select(.key | startswith("queue.") and endswith("collectionWorkQueueSize")) as $object |
            $object.value as $value |
            {
            name         : "solr_metrics_overseer_collectionWorkQueueSize",
            type         : "GAUGE",
            help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
            label_names  : [],
            label_values : [],
            value        : $value
            }
          </str>
          <str>
            .metrics | to_entries | .[] | select(.key | startswith("solr.overseer")) as $object |
            $object.value as $value | $value | to_entries | .[]  |
            select(.key | startswith("queue.") and endswith("stateUpdateQueueSize")) as $object |
            $object.value as $value |
            {
            name         : "solr_metrics_overseer_stateUpdateQueueSize",
            type         : "GAUGE",
            help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
            label_names  : [],
            label_values : [],
            value        : $value
            }
          </str>
          <!--
            node metrics
          -->
          <str>
            $jq:node(client_errors_total, select(.key | endswith(".clientErrors")), count)
          </str>
          <str>
            $jq:node(errors_total, select(.key | endswith(".errors")), count)
          </str>
          <str>
            $jq:node(requests_total, select(.key | endswith(".local.requestTimes")), count)
          </str>
          <str>
            $jq:node(server_errors_total, select(.key | endswith(".serverErrors")), count)
          </str>
          <str>
            $jq:node(timeouts_total, select(.key | endswith(".timeouts")), count)
          </str>
          <str>
            $jq:node(time_seconds_total, select(.key | endswith(".local.totalTime")), ($object.value / 1000))
          </str>
          <str>
            .metrics["solr.node"] | to_entries | .[] | select(.key | startswith("CONTAINER.cores.")) as $object |
            $object.key | split(".")[0] as $category |
            $object.key | split(".")[2] as $item |
            $object.value as $value |
            {
              name         : "solr_metrics_node_cores",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names  : ["category", "item"],
              label_values : [$category, $item],
              value        : $value
            }
          </str>
          <str>
            .metrics["solr.node"] | to_entries | .[] | select(.key | startswith("CONTAINER.fs.coreRoot.")) | select(.key | endswith(".totalSpace") or endswith(".usableSpace")) as $object |
            $object.key | split(".") as $key_items |
            $key_items | length as $label_len |
            $key_items[0] as $category |
            $key_items[3] as $item |
            $object.value as $value |
            {
              name         : "solr_metrics_node_core_root_fs_bytes",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names  : ["category", "item"],
              label_values : [$category, $item],
              value        : $value
            }
          </str>
          <str>
            $jq:node-thread-pool(completed_total, select(.key | endswith(".completed")), count)
          </str>
          <str>
            $jq:node-thread-pool(running, select(.key | endswith(".running")), object.value, GAUGE)
          </str>
          <str>
            $jq:node-thread-pool(submitted_total, select(.key | endswith(".submitted")), count)
          </str>
          <str>
            .metrics["solr.node"] | to_entries | .[] | select(.key | endswith("Connections")) as $object |
            $object.key | split(".") as $key_items |
            $key_items | length as $label_len |
            $key_items[0] as $category |
            $key_items[1] as $handler |
            $key_items[2] as $item |
            $object.value as $value |
            {
              name         : "solr_metrics_node_connections",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names  : ["category", "handler", "item"],
              label_values : [$category, $handler, $item],
              value        : $value
            }
          </str>

          <!--
          Query related core metrics; see jq-templates for details on the core-query template used below
          -->
          <str>
            $jq:core-query(errors_1minRate, select(.key | endswith(".errors")), 1minRate)
          </str>
          <str>
            $jq:core-query(client_errors_1minRate, select(.key | endswith(".clientErrors")), 1minRate)
          </str>
          <str>
            $jq:core-query(1minRate, select(.key | endswith(".distrib.requestTimes")), 1minRate)
          </str>
          <str>
            $jq:core-query(5minRate, select(.key | endswith(".distrib.requestTimes")), 5minRate)
          </str>
          <str>
            $jq:core-query(median_ms, select(.key | endswith(".distrib.requestTimes")), median_ms)
          </str>
          <str>
            $jq:core-query(p75_ms, select(.key | endswith(".distrib.requestTimes")), p75_ms)
          </str>
          <str>
            $jq:core-query(p95_ms, select(.key | endswith(".distrib.requestTimes")), p95_ms)
          </str>
          <str>
            $jq:core-query(p99_ms, select(.key | endswith(".distrib.requestTimes")), p99_ms)
          </str>
          <str>
            $jq:core-query(mean_rate, select(.key | endswith(".distrib.requestTimes")), meanRate)
          </str>

          <!-- Local (non-distrib) query metrics -->
          <str>
            $jq:core-query(local_1minRate, select(.key | endswith(".local.requestTimes")), 1minRate)
          </str>
          <str>
            $jq:core-query(local_5minRate, select(.key | endswith(".local.requestTimes")), 5minRate)
          </str>
          <str>
            $jq:core-query(local_median_ms, select(.key | endswith(".local.requestTimes")), median_ms)
          </str>
          <str>
            $jq:core-query(local_p75_ms, select(.key | endswith(".local.requestTimes")), p75_ms)
          </str>
          <str>
            $jq:core-query(local_p95_ms, select(.key | endswith(".local.requestTimes")), p95_ms)
          </str>
          <str>
            $jq:core-query(local_p99_ms, select(.key | endswith(".local.requestTimes")), p99_ms)
          </str>
          <str>
            $jq:core-query(local_mean_rate, select(.key | endswith(".local.requestTimes")), meanRate)
          </str>
          <str>
            $jq:core-query(local_count, select(.key | endswith(".local.requestTimes")), count, COUNTER)
          </str>

          <!-- core metrics other than query -->
          <str>
            $jq:core(client_errors_total, select(.key | endswith(".clientErrors")), count)
          </str>
          <str>
            $jq:core(errors_total, select(.key | endswith(".errors")) | select (.value | type == "object"), count)
          </str>
          <str>
            $jq:core(requests_total, select(.key | endswith(".requestTimes")) | select (.value | type == "object"), count)
          </str>
          <str>
            $jq:core(server_errors_total, select(.key | endswith(".serverErrors")) | select (.value | type == "object"), count)
          </str>
          <str>
            $jq:core(timeouts_total, select(.key | endswith(".timeouts")) | select (.value | type == "object"), count)
          </str>
          <str>
            $jq:core(time_seconds_total, select(.key | endswith(".totalTime")), ($object.value / 1000))
          </str>
          <str>
            .metrics | to_entries | .[] | select (.key | startswith("solr.core.")) as $parent |
            $parent.key | split(".") as $parent_key_items |
            $parent_key_items | length as $parent_key_item_len |
            (if $parent_key_item_len == 3 then $parent_key_items[2] else "" end) as $core |
            (if $parent_key_item_len == 5 then $parent_key_items[2] else "" end) as $collection |
            (if $parent_key_item_len == 5 then $parent_key_items[3] else "" end) as $shard |
            (if $parent_key_item_len == 5 then $parent_key_items[4] else "" end) as $replica |
            (if $parent_key_item_len == 5 then ($collection + "_" + $shard + "_" + $replica) else $core end) as $core |
            $parent.value | to_entries | .[] | select(.key == "CACHE.core.fieldCache") as $object |
            $object.key | split(".")[0] as $category |
            $object.value.entries_count as $value |
            if $parent_key_item_len == 3 then
            {
            name: "solr_metrics_core_field_cache_total",
            type: "COUNTER",
            help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
            label_names: ["category", "core"],
            label_values: [$category, $core],
            value: $value
            }
            else
            {
            name: "solr_metrics_core_field_cache_total",
            type: "COUNTER",
            help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
            label_names: ["category", "core", "collection", "shard", "replica"],
            label_values: [$category, $core, $collection, $shard, $replica],
            value: $value
            }
            end
          </str>
          <str>
            $jq:update-handler(update_handler_adds, select(.key == "UPDATE.updateHandler.adds"), object.value, GAUGE)
          </str>
          <str>
            $jq:update-handler(update_handler_auto_commits_total, select(.key == "UPDATE.updateHandler.autoCommits"), object.value)
          </str>
          <str>
            $jq:update-handler(update_handler_commits_total, select(.key == "UPDATE.updateHandler.commits"), count)
          </str>
          <str>
            $jq:update-handler(update_handler_adds_total, select(.key == "UPDATE.updateHandler.cumulativeAdds"), count)
          </str>
          <str>
            $jq:update-handler(update_handler_deletes_by_id_total, select(.key == "UPDATE.updateHandler.cumulativeDeletesById"), count)
          </str>
          <str>
            $jq:update-handler(update_handler_deletes_by_query_total, select(.key == "UPDATE.updateHandler.cumulativeDeletesByQuery"), count)
          </str>
          <str>
            $jq:update-handler(update_handler_errors_total, select(.key == "UPDATE.updateHandler.cumulativeErrors"), count)
          </str>
          <str>
            $jq:update-handler(update_handler_deletes_by_id, select(.key == "UPDATE.updateHandler.deletesById"), object.value, GAUGE)
          </str>
          <str>
            $jq:update-handler(update_handler_deletes_by_query, select(.key == "UPDATE.updateHandler.deletesByQuery"), object.value, GAUGE)
          </str>
          <str>
            $jq:update-handler(update_handler_pending_docs, select(.key == "UPDATE.updateHandler.docsPending"), object.value, GAUGE)
          </str>
          <str>
            $jq:update-handler(update_handler_errors, select(.key == "UPDATE.updateHandler.errors"), object.value, GAUGE)
          </str>
          <str>
            $jq:update-handler(update_handler_expunge_deletes_total, select(.key == "UPDATE.updateHandler.expungeDeletes"), count)
          </str>
          <str>
            $jq:update-handler(update_handler_merges_total, select(.key == "UPDATE.updateHandler.merges"), count)
          </str>
          <str>
            $jq:update-handler(update_handler_optimizes_total, select(.key == "UPDATE.updateHandler.optimizes"), count)
          </str>
          <str>
            $jq:update-handler(update_handler_rollbacks_total, select(.key == "UPDATE.updateHandler.rollbacks"), count)
          </str>
          <str>
            $jq:update-handler(update_handler_soft_auto_commits_total, select(.key == "UPDATE.updateHandler.softAutoCommits"), object.value)
          </str>
          <str>
            $jq:update-handler(update_handler_splits_total, select(.key == "UPDATE.updateHandler.splits"), count)
          </str>

          <str>
            $jq:cache-searcher(cache, select(.key == "lookups" or .key == "hits" or .key == "size" or .key == "evictions" or .key == "inserts"), $target.value)
          </str>
          <str>
            $jq:cache-searcher(cache_ratio, select(.key == "hitratio"), $target.value)
          </str>
          <str>
            $jq:cache-searcher(warmup_time_seconds, select(.key == "warmupTime"), ($target.value / 1000))
          </str>
          <str>
            $jq:cache-searcher(cumulative_cache_total,
                               select(.key == "cumulative_lookups" or .key == "cumulative_hits" or .key == "cumulative_evictions" or .key == "cumulative_inserts"),
                               $target.value,
                               COUNTER)
          </str>
          <str>
            $jq:cache-searcher(cumulative_cache_ratio, select(.key == "cumulative_hitratio"), $target.value)
          </str>
          <str>
            .metrics | to_entries | .[] | select(.key | startswith("solr.core.")) as $parent |
            $parent.key | split(".") as $parent_key_items |
            $parent_key_items | length as $parent_key_item_len |
            (if $parent_key_item_len == 3 then $parent_key_items[2] else "" end) as $core |
            (if $parent_key_item_len == 5 then $parent_key_items[2] else "" end) as $collection |
            (if $parent_key_item_len == 5 then $parent_key_items[3] else "" end) as $shard |
            (if $parent_key_item_len == 5 then $parent_key_items[4] else "" end) as $replica |
            (if $parent_key_item_len == 5 then ($collection + "_" + $shard + "_" + $replica) else $core end) as $core |
            $parent.value | to_entries | .[] | select(.key | startswith("CORE.fs.")) | select (.key | endswith(".totalSpace") or endswith(".usableSpace")) as $object |
            $object.key | split(".")[0] as $category |
            $object.key | split(".")[2] as $item |
            $object.value as $value |
            if $parent_key_item_len == 3 then
            {
              name: "solr_metrics_core_fs_bytes",
              type: "GAUGE",
              help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names: ["category", "core", "item"],
              label_values: [$category, $core, $item],
              value: $value
            }
            else
            {
              name: "solr_metrics_core_fs_bytes",
              type: "GAUGE",
              help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names: ["category", "core", "collection", "shard", "replica", "item"],
              label_values: [$category, $core, $collection, $shard, $replica, $item],
              value: $value
            }
            end
          </str>
          <str>
            .metrics | to_entries | .[] | select(.key | startswith("solr.core.")) as $parent |
            $parent.key | split(".") as $parent_key_items |
            $parent_key_items | length as $parent_key_item_len |
            (if $parent_key_item_len == 3 then $parent_key_items[2] else "" end) as $core |
            (if $parent_key_item_len == 5 then $parent_key_items[2] else "" end) as $collection |
            (if $parent_key_item_len == 5 then $parent_key_items[3] else "" end) as $shard |
            (if $parent_key_item_len == 5 then $parent_key_items[4] else "" end) as $replica |
            (if $parent_key_item_len == 5 then ($collection + "_" + $shard + "_" + $replica) else $core end) as $core |
            $parent.value | to_entries | .[] | select(.key | startswith("HIGHLIGHTER.")) | select (.key | endswith(".requests")) as $object |
            $object.key | split(".")[0] as $category |
            $object.key | split(".")[1] as $name |
            $object.key | split(".")[2] as $item |
            $object.value as $value |
            if $parent_key_item_len == 3 then
            {
              name: "solr_metrics_core_highlighter_request_total",
              type: "COUNTER",
              help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names: ["category", "core", "name", "item"],
              label_values: [$category, $core, $name, $item],
              value: $value
            }
            else
            {
              name: "solr_metrics_core_highlighter_request_total",
              type: "COUNTER",
              help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names: ["category", "core", "collection", "shard", "replica", "name", "item"],
              label_values: [$category, $core, $collection, $shard, $replica, $name, $item],
              value: $value
            }
            end
          </str>
          <str>
            .metrics | to_entries | .[] | select(.key | startswith("solr.core.")) as $parent |
            $parent.key | split(".") as $parent_key_items |
            $parent_key_items | length as $parent_key_item_len |
            (if $parent_key_item_len == 3 then $parent_key_items[2] else "" end) as $core |
            (if $parent_key_item_len == 5 then $parent_key_items[2] else "" end) as $collection |
            (if $parent_key_item_len == 5 then $parent_key_items[3] else "" end) as $shard |
            (if $parent_key_item_len == 5 then $parent_key_items[4] else "" end) as $replica |
            (if $parent_key_item_len == 5 then ($collection + "_" + $shard + "_" + $replica) else $core end) as $core |
            $parent.value | to_entries | .[] | select(.key == "INDEX.sizeInBytes") as $object |
            $object.key | split(".")[0] as $category |
            $object.value as $value |
            if $parent_key_item_len == 3 then
            {
              name: "solr_metrics_core_index_size_bytes",
              type: "GAUGE",
              help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names: ["category", "core"],
              label_values: [$category, $core],
              value: $value
            }
            else
            {
              name: "solr_metrics_core_index_size_bytes",
              type: "GAUGE",
              help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names: ["category", "core", "collection", "shard", "replica"],
              label_values: [$category, $core, $collection, $shard, $replica],
              value: $value
            }
            end
          </str>
          <str>
            .metrics | to_entries | .[] | select(.key | startswith("solr.core.")) as $parent |
            $parent.key | split(".") as $parent_key_items |
            $parent_key_items | length as $parent_key_item_len |
            (if $parent_key_item_len == 3 then $parent_key_items[2] else "" end) as $core |
            (if $parent_key_item_len == 5 then $parent_key_items[2] else "" end) as $collection |
            (if $parent_key_item_len == 5 then $parent_key_items[3] else "" end) as $shard |
            (if $parent_key_item_len == 5 then $parent_key_items[4] else "" end) as $replica |
            (if $parent_key_item_len == 5 then ($collection + "_" + $shard + "_" + $replica) else $core end) as $core |
            $parent.value | to_entries | .[] | select(.key == "REPLICATION./replication.isMaster") as $object |
            $object.key | split(".")[0] as $category |
            $object.key | split(".")[1] as $handler |
            (if $object.value == true then 1.0 else 0.0 end) as $value |
            if $parent_key_item_len == 3 then
            {
              name: "solr_metrics_core_replication_master",
              type: "GAUGE",
              help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names: ["category", "handler", "core"],
              label_values: [$category, $handler, $core],
              value: $value
            }
            else
            {
              name: "solr_metrics_core_replication_master",
              type: "GAUGE",
              help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names: ["category", "handler", "core", "collection", "shard", "replica"],
              label_values: [$category, $handler, $core, $collection, $shard, $replica],
              value: $value
            }
            end
          </str>
          <str>
            .metrics | to_entries | .[] | select(.key | startswith("solr.core.")) as $parent |
            $parent.key | split(".") as $parent_key_items |
            $parent_key_items | length as $parent_key_item_len |
            (if $parent_key_item_len == 3 then $parent_key_items[2] else "" end) as $core |
            (if $parent_key_item_len == 5 then $parent_key_items[2] else "" end) as $collection |
            (if $parent_key_item_len == 5 then $parent_key_items[3] else "" end) as $shard |
            (if $parent_key_item_len == 5 then $parent_key_items[4] else "" end) as $replica |
            (if $parent_key_item_len == 5 then ($collection + "_" + $shard + "_" + $replica) else $core end) as $core |
            $parent.value | to_entries | .[] | select(.key == "REPLICATION./replication.isSlave") as $object |
            $object.key | split(".")[0] as $category |
            $object.key | split(".")[1] as $handler |
            (if $object.value == true then 1.0 else 0.0 end) as $value |
            if $parent_key_item_len == 3 then
            {
              name: "solr_metrics_core_replication_slave",
              type: "GAUGE",
              help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names: ["category", "handler", "core"],
              label_values: [$category, $handler, $core],
              value: $value
            }
            else
            {
              name: "solr_metrics_core_replication_slave",
              type: "GAUGE",
              help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names: ["category", "handler", "core", "collection", "shard", "replica"],
              label_values: [$category, $handler, $core, $collection, $shard, $replica],
              value: $value
            }
            end
          </str>
          <str>
            .metrics | to_entries | .[] | select(.key | startswith("solr.core.")) as $parent |
            $parent.key | split(".") as $parent_key_items |
            $parent_key_items | length as $parent_key_item_len |
            (if $parent_key_item_len == 3 then $parent_key_items[2] else "" end) as $core |
            (if $parent_key_item_len == 5 then $parent_key_items[2] else "" end) as $collection |
            (if $parent_key_item_len == 5 then $parent_key_items[3] else "" end) as $shard |
            (if $parent_key_item_len == 5 then $parent_key_items[4] else "" end) as $replica |
            (if $parent_key_item_len == 5 then ($collection + "_" + $shard + "_" + $replica) else $core end) as $core |
            $parent.value | to_entries | .[] | select(.key == "SEARCHER.searcher.deletedDocs" or .key == "SEARCHER.searcher.maxDoc" or .key == "SEARCHER.searcher.numDocs") as $object |
            $object.key | split(".")[0] as $category |
            $object.key | split(".")[2] as $item |
            $object.value as $value |
            if $parent_key_item_len == 3 then
            {
              name: "solr_metrics_core_searcher_documents",
              type: "GAUGE",
              help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names: ["category", "core", "item"],
              label_values: [$category, $core, $item],
              value: $value
            }
            else
            {
              name: "solr_metrics_core_searcher_documents",
              type: "GAUGE",
              help: "See following URL: https://lucene.apache.org/solr/guide/metrics-reporting.html",
              label_names: ["category", "core", "collection", "shard", "replica", "item"],
              label_values: [$category, $core, $collection, $shard, $replica, $item],
              value: $value
            }
            end
          </str>
        </arr>
      </lst>
    </metrics>

    <collections>
      <lst name="request">
        <lst name="query">
          <str name="path">/admin/collections</str>
          <lst name="params">
            <str name="action">CLUSTERSTATUS</str>
          </lst>
        </lst>
        <arr name="jsonQueries">
          <str>
            .cluster.live_nodes | length as $value|
            {
              name         : "solr_collections_live_nodes",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/collections-api.html#clusterstatus",
              label_names  : [],
              label_values : [],
              value        : $value
            }
          </str>
          <str>
            .cluster.collections | to_entries | .[] | . as $object |
            $object.key as $collection |
            $object.value.pullReplicas | tonumber as $value |
            {
              name         : "solr_collections_pull_replicas",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/collections-api.html#clusterstatus",
              label_names  : ["collection"],
              label_values : [$collection],
              value        : $value
            }
          </str>
          <str>
            .cluster.collections | to_entries | .[] | . as $object |
            $object.key as $collection |
            $object.value.nrtReplicas | tonumber as $value |
            {
              name         : "solr_collections_nrt_replicas",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/collections-api.html#clusterstatus",
              label_names  : ["collection"],
              label_values : [$collection],
              value        : $value
            }
          </str>
          <str>
            .cluster.collections | to_entries | .[] | . as $object |
            $object.key as $collection |
            $object.value.tlogReplicas | tonumber as $value |
            {
              name         : "solr_collections_tlog_replicas",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/collections-api.html#clusterstatus",
              label_names  : ["collection"],
              label_values : [$collection],
              value        : $value
            }
          </str>
          <str>
            .cluster.collections | to_entries | .[] | . as $object |
            $object.key as $collection |
            $object.value.shards | to_entries | .[] | . as $shard_obj |
            $shard_obj.key as $shard |
            (if $shard_obj.value.state == "active" then 1.0 else 0.0 end) as $value |
            {
              name         : "solr_collections_shard_state",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/collections-api.html#clusterstatus",
              label_names  : ["collection","shard"],
              label_values : [$collection,$shard],
              value        : $value
            }
          </str>
          <str>
            .cluster.collections | to_entries | .[] | . as $object |
            $object.key as $collection |
            $object.value.shards | to_entries | .[] | . as $shard_obj |
            $shard_obj.key as $shard |
            $shard_obj.value.replicas | to_entries | .[] | . as $replica_obj |
            $replica_obj.key as $replica_name |
            $replica_obj.value.core as $core |
            $core[$collection + "_" + $shard + "_" | length:] as $replica |
            $replica_obj.value.base_url as $base_url |
            $replica_obj.value.node_name as $node_name |
            $replica_obj.value.type as $type |
            $replica_obj.value.state as $state |
            (if $state == "active" then 1.0 else 0.0 end) as $value |
            {
              name         : "solr_collections_replica_state",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/collections-api.html#clusterstatus",
              label_names  : ["collection", "shard", "replica", "replica_name", "core", "base_url", "node_name", "type", "state"],
              label_values : [$collection, $shard, $replica, $replica_name, $core, $base_url, $node_name, $type, $state],
              value        : $value
            }
          </str>

          <str>
            .cluster.collections | to_entries | .[] | . as $object |
            $object.key as $collection |
            $object.value.shards | to_entries | .[] | . as $shard_obj |
            $shard_obj.key as $shard |
            $shard_obj.value.replicas | to_entries | .[] | . as $replica_obj |
            $replica_obj.key as $replica_name |
            $replica_obj.value.core as $core |
            $core[$collection + "_" + $shard + "_" | length:] as $replica |
            $replica_obj.value.base_url as $base_url |
            $replica_obj.value.node_name as $node_name |
            $replica_obj.value.type as $type |
            (if $replica_obj.value.leader == "true" then 1.0 else 0.0 end) as $value |
            {
              name         : "solr_collections_shard_leader",
              type         : "GAUGE",
              help         : "See following URL: https://lucene.apache.org/solr/guide/collections-api.html#clusterstatus",
              label_names  : ["collection", "shard", "replica", "replica_name", "core", "base_url", "node_name", "type"],
              label_values : [$collection, $shard, $replica, $replica_name, $core, $base_url, $node_name, $type],
              value        : $value
            }
          </str>
        </arr>
      </lst>
      <lst name="request">
        <lst name="query">
          <str name="path">/admin/zookeeper/status</str>
        </lst>
        <arr name="jsonQueries">
          <str>
            .zkStatus.ensembleSize as $value |
            .zkStatus.mode as $mode |
            {
            name         : "solr_zookeeper_ensemble_size",
            type         : "GAUGE",
            help         : "See following URL: https://solr.apache.org/guide/cloud-screens.html#zk-status-view",
            label_names  : [],
            label_values : [],
            value        : $value
            }
          </str>
          <str>
            .zkStatus.details[] as $object |
            $object.host as $host |
            $object.ok as $ok |
            (if $object.clientPort != null and $ok then 1.0 else 0.0 end) as $value |
            {
            name         : "solr_zookeeper_nodestatus",
            type         : "GAUGE",
            help         : "See following URL: https://solr.apache.org/guide/cloud-screens.html#zk-status-view",
            label_names  : ["host"],
            label_values : [$host],
            value        : $value
            }
          </str>
          <str>
            .zkStatus.status as $statusText |
            (if $statusText == "green" then 1.0 else 0.0 end) as $value |
            {
            name         : "solr_zookeeper_status",
            type         : "GAUGE",
            help         : "See following URL: https://solr.apache.org/guide/cloud-screens.html#zk-status-view",
            label_names  : ["status"],
            label_values : [$statusText],
            value        : $value
            }
          </str>
        </arr>
      </lst>
    </collections>

    <!--
    <search>
      <lst name="request">
        <lst name="query">
          <str name="collection">collection1</str>
          <str name="path">/select</str>
          <lst name="params">
            <str name="q">*:*</str>
            <str name="start">0</str>
            <str name="rows">0</str>
            <str name="json.facet">
              {
                category: {
                  type: terms,
                  field: cat
                }
              }
            </str>
          </lst>
        </lst>
        <arr name="jsonQueries">
          <str>
            .facets.category.buckets[] as $object |
            $object.val as $term |
            $object.count as $value |
            {
              name         : "solr_facets_category",
              type         : "GAUGE",
              help         : "Category facets",
              label_names  : ["term"],
              label_values : [$term],
              value        : $value
            }
          </str>
        </arr>
      </lst>
    </search>
    -->

  </rules>

</config>

this should be the default config that ships with 8.11.3, I saw the same behavior with 9.x as well.

mlbiscoc commented 3 weeks ago

Did a bit of digging and I think the cause is because of the metrics api in Solr being different from 8->9. Solr 8: "ADMIN./admin/ping.totalTime":4869095628, "ADMIN./admin/ping.distrib.totalTime":2035581611, "ADMIN./admin/ping.local.totalTime":0,

Solr 9: "ADMIN./admin/ping.totalTime":3739370399, "ADMIN./admin/ping.totalTime":3994744966,

I think the prometheus exporter is scraping Solr 8's api but didn't append the correct labels of distrib. Solr 9 had a change that removed distrib and that kind of fixed the problem in Solr 9. I haven't actually tested this and just did digging but the config in needs to be changed for Solr 8's prometheus exporter to probably remove the duplicate metric. I'd give it a shot if I were you or maybe delete the <str name="expr">solr\.core\..*:ADMIN\..*</str> line to remove that metric from being output from the prometheus exporter

perosb commented 2 weeks ago

Did a bit of digging and I think the cause is because of the metrics api in Solr being different from 8->9. Solr 8: "ADMIN./admin/ping.totalTime":4869095628, "ADMIN./admin/ping.distrib.totalTime":2035581611, "ADMIN./admin/ping.local.totalTime":0,

Solr 9: "ADMIN./admin/ping.totalTime":3739370399, "ADMIN./admin/ping.totalTime":3994744966,

I think the prometheus exporter is scraping Solr 8's api but didn't append the correct labels of distrib. Solr 9 had a change that removed distrib and that kind of fixed the problem in Solr 9. I haven't actually tested this and just did digging but the config in needs to be changed for Solr 8's prometheus exporter to probably remove the duplicate metric. I'd give it a shot if I were you or maybe delete the <str name="expr">solr\.core\..*:ADMIN\..*</str> line to remove that metric from being output from the prometheus exporter

Thanks for digging into this. I removed the ADMIN parts from exporter config and those duplicates are gone, however the more important metrics for QUERY still occur. These metrics are more interesting and should probably stay.

mlbiscoc commented 2 weeks ago

Got it. What if you try updating the expression to omit distrib and local? Try using this expression instead to grab all metrics except admin/ping holding distrib and local solr\.core\..*:ADMIN\.\/admin\/ping\.(?!distrib)(?!local).*

perosb commented 2 weeks ago

Thanks for the help @mlbiscoc.

I changed the exporter config and activated the Alternative expressions, which are much stricter but still provide enough data to populate the default dashboard.

Then I added (?!distrib)(?!local) as you said:

<str name="expr">solr\.core\..*:QUERY\./(select|get|export|stream|query|graph|sql)\.(?!distrib)(?!local).*requestTimes:(count|1minRate|5minRate|median_ms|meanRate|p75_ms|p95_ms|p99_ms)</str>
<str name="expr">solr\.core\..*:QUERY\./(select|get|export|stream|query|graph|sql)\.totalTime</str>
<str name="expr">solr\.core\..*:QUERY\./(select|get|export|stream|query|graph|sql)\.(?!distrib)(?!local).*rrors:(count!1minRate)</str>

Edit: Just noticed I'm not getting any Query metrics tho :scream: Need to revisit this later.... :cry:

AnthonyTissot commented 1 week ago

same issue, did you find a solution finally ?

perosb commented 1 week ago

same issue, did you find a solution finally ?

No not yet, but I assume the idea of excluding distrib|local should work.

matthiasbosc commented 1 week ago

I might have figured out how to fix the issue @perosb.

The alternative expressions exclude the metrics we are interested about, so I had to modify the default pattern, from this: <str name="expr">solr\.core\..*:QUERY\..*</str> To this: <str name="expr">solr\.core\..*:QUERY\.[^.]*\.(?!distrib|local).*</str>

And it's working as @mlbiscoc was expecting.

FYI:

Thanks for your help Matthias