elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.65k stars 8.22k forks source link

OpenTelemetry jvm memory metrics #168548

Open kes2464 opened 1 year ago

kes2464 commented 1 year ago

Kibana version: v 8.8.2

Elasticsearch version: v 8.8.2

Server OS version: CentOS Linux 7 (Core)

Browser version: Chrome

Browser OS version:

Original install method (e.g. download page, yum, from source, etc.):

Describe the bug: When Kibana loads otel memory metrics,

It runs a query similar to: (there are more filter queries, but for my app it wasn't really needed)

GET *metrics-apm*/_search?terminate_after=10000000
{
  "track_total_hits": 1,
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        { "terms": { "service.name": ["my-service"] } },
        { "terms": { "service.environment": ["test"] } },
        { "terms": { "labels.type": ["heap"] } },
        { "terms" : { "service.node.name": ["abcde"]}},
        {
          "range": {
            "@timestamp": {
              "gte": "2023-09-28T01:15:00Z",
              "lte": "2023-09-28T01:15:59Z"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "timeseriesData": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "60s"
      },
      "aggs": {
        "heapMemoryUsed": {
          "avg": {
            "field": "process.runtime.jvm.memory.usage"
          }
        },
        "heapMemoryCommitted": {
          "avg": {
            "field": "process.runtime.jvm.memory.committed"
          }
        },
        "heapMemoryMax": {
          "avg": {
            "field": "process.runtime.jvm.memory.limit"
          }
        }
      }
    }
  }
}

Elastic memory metrics have metrics for each pool (jvm.memory.heap.pool.used, jvm.memory.pool.heap.committed and jvm.memory.pool.heap.max) as well as total (jvm.memory.heap.used, jvm.memory.heap.committed and jvm.memory.heap.max)

While opentelemetry does not provide the total, and just send metrics for each pool which is distinguished by labels.pool. (process.runtime.jvm.memory.usage, process.runtime.jvm.memory.committed and process.runtime.jvm.memory.limit)

So, the kibana query above is getting the avg of the pools rather than the sum.

Steps to reproduce:

  1. Send otlp metrics data to apm server.
  2. Go to Kibana APM, select the service and go to Metrics
  3. Select an instance, check the Heap Memory
  4. Cross check with Elastic agent's data, or other metrics like prometheus.

Expected behavior: Should calculate the sum of all memory pools

Screenshots (if relevant):

Errors in browser console (if relevant):

Provide logs and/or server output (if relevant):

Any additional context:

kes2464 commented 1 year ago

Also found the similar issue for thread count. (process.runtime.jvm.threads.count) Otel sends 2 metric for daemon and non-daemon, (label.daemon) and the Kibana query is the average of those 2.

elasticmachine commented 1 year ago

Pinging @elastic/apm-ui (Team:APM)

elasticmachine commented 1 month ago

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)