elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.18k stars 3.49k forks source link

(output.elasticsearch) emit x-elastic-product-origin #16448

Open stefnestor opened 5 days ago

stefnestor commented 5 days ago

👋🏽 howdy, team!

Other products have been expanded such that their Bulk/ingest requests include header x-elastic-product-origin so we can better identify traffic from its upstream source. Will you kindly consider expanding Logstash's Elasticsearch output to do the same?

Looks like this from Elasticsearch's List Tasks:

# 01641222
$ cat tasks.json | jq '.nodes[].tasks[].headers."X-elastic-product-origin"' | sort | uniq -c | sort -r
4841 null
 420 "kibana"
 192 "beats"
 170 "observability"
  38 "fleet"
  19 "cloud"
$ cat tasks.json | jq '.nodes[].tasks[]|select(.action=="indices:data/write/bulk")|.headers."X-elastic-product-origin"'| sort | uniq -c | sort -r
 156 null
  26 "observability"
  22 "beats"
  19 "kibana"

Single examples (privacy redacted)

$ cat tasks.json | jq '.nodes[].tasks[]|select(.action=="indices:data/write/bulk")|select(.headers."X-elastic-product-origin"=="beats")'
{
  "node": "XXXXX",
  "id": 284439927,
  "type": "transport",
  "action": "indices:data/write/bulk",
  "description": "requests[35], indices[]",
  "start_time": "2024-08-20T21:02:40.823Z",
  "start_time_in_millis": 1724187760823,
  "running_time": "312.5ms",
  "running_time_in_nanos": 312556386,
  "cancellable": false,
  "headers": {
    "X-elastic-product-origin": "beats"
  }
}

$ cat tasks.json | jq '.nodes[].tasks[]|select(.action=="indices:data/write/bulk")|select(.headers."X-elastic-product-origin"=="kibana")'
{
  "node": "XXXXX",
  "id": 194746221,
  "type": "transport",
  "action": "indices:data/write/bulk",
  "description": "requests[5], indices[.internal.alerts-observability.metrics.alerts-default-000001]",
  "start_time": "2024-08-20T21:02:38.357Z",
  "start_time_in_millis": 1724187758357,
  "running_time": "2.7s",
  "running_time_in_nanos": 2779492550,
  "cancellable": false,
  "headers": {
    "X-elastic-product-origin": "kibana",
    "trace.id": "258cc02b3387a34fade912fbdXXXXXXX",
    "X-Opaque-Id": "unknownId;kibana:task%20manager:run%20alerting%3Ametrics.alert.threshold:53b10185-XXXX-4494-b859-964d1b7XXXXX"
  }
}

Ideally, same as Kibana and Agent(/beats) examples, Logstash will also self-report to more easily trace back high ingest volume/queues on the Elasticsearch side.

TIA! 🙏

(Self note: Building off es#111773 as we work to more officially publish this common troubleshooting.)

lucabelluccini commented 4 days ago

Proposing https://github.com/logstash-plugins/logstash-output-elasticsearch/pull/1189 Will raise this on my next sync with LS