elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
978 stars 24.82k forks source link

Ingest stats: Don't increment a processor's failed metric when failure is ignored or handled #87081

Open adriansr opened 2 years ago

adriansr commented 2 years ago

Description

The ingest processor's failed metric is incremented for processors that have ignore_failure set.

This can lead to misleading reports / alerts:

image

Actual processor stats:

{
  "metric_type": "processor",
  "pipeline": "filebeat-7.17.3-sophos-xg-firewall",
  "count": 13597705,
  "time": "16.1m",
  "time_in_millis": 967520,
  "current": 0,
  "failed": 6887511,
  "calculated": {
    "execution_time_avg_ns": 71153.18357031573,
    "failed_pct": 0.5065201076211022
  },
  "processor_index": 76,
  "processor_type": "lowercase",
  "processor_name": "77_lowercase",
  "stat_name": "lowercase",
  "conditional": false,
  "definition": "{\"field\":\"network.protocol\",\"ignore_failure\":true}", # <- here
}

Alert from diagnostics tool:

pipeline_stats:Ingest pipeline is reporting over 1000 failures.

I'd like to suggest not incrementing this metric for processors where the failure is either ignored or handled by the processor's own on_failure.

elasticmachine commented 2 years ago

Pinging @elastic/es-data-management (Team:Data Management)