elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.17k stars 3.49k forks source link

Adapt pipeline specific APIs to support multi pipeline #6531

Open jsvd opened 7 years ago

jsvd commented 7 years ago

Currently, some of the apis assume only 1 pipeline exists. With the upcoming multiple pipeline feature, this needs to be addressed, with the caveat of keeping backwards compatibility.

Single Pipeline Assumptions:

1. Pipeline Info API

GET /_node/pipeline

{
  "host": "Joaos-MBP-5.lan",
  "version": "6.0.0-alpha1",
  "http_address": "127.0.0.1:9600",
  "id": "3d3147fe-eb86-45e9-9d13-2fb83a7c1550",
  "name": "Joaos-MBP-5.lan",
  "pipeline": {
    "workers": 4,
    "batch_size": 125,
    "batch_delay": 5,
    "config_reload_automatic": false,
    "config_reload_interval": 3
  }
}

2. Pipeline Stats API

GET /_node/stats/pipeline

{
  "pipeline": {
    "events": {
      "duration_in_millis": 7863504,
      "in": 100,
      "filtered": 100,
      "out": 100
    },
    "plugins": {
      "inputs": [],
      "filters": [
        {
          "id": "grok_20e5cb7f7c9e712ef9750edf94aefb465e3e361b-2",
          "events": {
            "duration_in_millis": 48,
            "in": 100,
            "out": 100
          },
          "matches": 100,
          "patterns_per_field": {
            "message": 1
          },
          "name": "grok"
        },
        {
          "id": "geoip_20e5cb7f7c9e712ef9750edf94aefb465e3e361b-3",
          "events": {
            "duration_in_millis": 141,
            "in": 100,
            "out": 100
          },
          "name": "geoip"
        }
      ],
      "outputs": [
        {
          "id": "20e5cb7f7c9e712ef9750edf94aefb465e3e361b-4",
          "events": {
            "in": 100,
            "out": 100
          },
          "name": "elasticsearch"
        }
      ]
    },
    "reloads": {
      "last_error": null,
      "successes": 0,
      "last_success_timestamp": null,
      "last_failure_timestamp": null,
      "failures": 0
    }
  }

3. Node Stats API

The node stats includes data from the pipeline stats api:

GET /_node/stats/pipeline

{
  "host": "Joaos-MBP-5.lan",
  "version": "6.0.0-alpha1",
  "http_address": "127.0.0.1:9600",
  "id": "3d3147fe-eb86-45e9-9d13-2fb83a7c1550",
  "name": "Joaos-MBP-5.lan",
  "jvm": {
    # ...
    "uptime_in_millis": 108897
  },
  "process": {
    #...
  },
  "pipeline": {
    "events": {
      "duration_in_millis": 0,
      "in": 0,
      "filtered": 0,
      "out": 0
    },
    "plugins": {
      "inputs": [],
      "filters": [],
      "outputs": [
        {
          "id": "92ddb0615336293c1757cac81c3bebfa19985e68-2",
          "name": "stdout"
        }
      ]
    },
    "reloads": {
      "last_error": null,
      "successes": 0,
      "last_success_timestamp": null,
      "last_failure_timestamp": null,
      "failures": 0
    },
    "queue": {
      "type": "memory"
    }
  },
  "reloads": {
    "successes": 0,
    "failures": 0
  },
  "os": {}
}
jsvd commented 7 years ago

About /_node/pipeline and /_node/stats/pipeline, the APIs should return the first registered "user facing" pipeline. By user facing it means that we can, in the future, have internal pipelines that ship with logstash who shouldn't be presented as "the default pipeline". Also, we need to include an "id" field in the pipeline info/stats documents.

Wrt to /_node/stats/, the ideal scenario would be to replace the pipeline key with a pipelines array/object, but we must keep bwc. thoughts?

suyograo commented 7 years ago

About /_node/pipeline and /_node/stats/pipeline, the APIs should return the first registered "user facing" pipeline.

I am +1 on this. It would preserve BWC. We should also extend this API to take a pipeline id. For example,


GET /_node/pipeline returns first pipeline (main, for example)

Add pipeline ID in response.

"pipeline": {
    "id": "main"
    "workers": 4,
    "batch_size": 125,
    "batch_delay": 5,
    "config_reload_automatic": false,
    "config_reload_interval": 3
  }

GET /_node/pipeline/:pipeline_id

Similar to above, but filtered by pipeline ID.


Similar to above for GET /_node/stats/pipeline.


Wrt to /_node/stats/, the ideal scenario would be to replace the pipeline key with a pipelines array/object, but we must keep bwc. thoughts?

@jsvd this metrics API was marked experimental just for this reason. At the time of 5.0, we knew about multipipelines, but we didn't know concretely how it would affect existing pipelines.

There is provision to break BWC here, but there also tools such as https://github.com/consulthys/logstashbeat that rely on this structure..

Another option is to use GET /_node/stats/pipelines. Note the plural pipelines here and deprecate the singular one (to be dropped in 6.0). This could return a pipelines object which would be an array.

jsvd commented 7 years ago

Ok, so:

GET /_node/pipeline - info on first registered pipeline (usually main) GET /_node/pipeline/:pipeline_id - info on pipeline by this id GET /_node/stats/pipeline - stats on first registered pipeline (usually main) GET /_node/stats/pipeline/:pipeline_id - stats on pipeline by this id

Mark /_node/pipeline and /_node/stats/pipeline as deprecated (remove in 6.x)

Also add:

GET /_node/pipelines - info on all registered pipelines GET /_node/pipelines/:pipeline_id - info on pipeline with this id GET /_node/stats/pipelines - stats on all registered pipelines GET /_node/stats/pipelines/:pipeline_id - stats on pipeline by this id

What is missing is the changes to the /_node/stats document. WRT to the top level document of node stats, should the pipelines/pipeline key (example here) list all pipelines and their stats, or a summary of all? if it's a summary we'll have to drop the plugins key, and maybe add a last_reloaded_pipeline_id?

ph commented 7 years ago

I am +1 with your changes proposal and the dropping singular endpoint in 6.0.

I am in favor of having a summary and having and option to get the full details?

As a tool author, I would prefer to do only one call to the api to retrieve as much information as I can but, with plugins and multiples pipelines this output could get quite noisy.

jsvd commented 7 years ago

implementation-wise, in terms of what we constitute as "first registered pipeline", there are two options: a) rely on Hash's enumerable to get a {}.first b) create a new setting, metric or global value that gets set on the first call to Agent#register_pipeline c) Explicitly sort the existing pipelines by some criteria and select the first

Option a) is certainly easier but it's flaky, I don't believe we should rely on default sorting of keys in a Hash Option b) suggests a metric, since this new variable that holds the name of the first registered pipeline must be accessible in the agent (to be set) and in the api code (to be read) As for Option c), on the Agent side we could introduce a created_at timestamp and sort by that, but we need to include this value on the metric side, so the api commands can reach it. Another alternative is to order by name of the pipeline.

Since this touches the Agent/Pipeline/Metric/Api barriers that @ph is re-evaluating, any thoughts?

jsvd commented 7 years ago

New version:

  1. Mark /_node/pipeline and /_node/stats/pipeline as deprecated (remove in 6.x)

  2. With multiple pipeline off:

GET /_node/pipeline - info on pipeline with id pipeline.id (default: main) GET /_node/stats/pipeline - stats on pipeline with id pipeline.id (default: main)

  1. With multiple pipeline on:

GET /_node/pipeline - redirects to /_node/pipelines GET /_node/stats/pipeline - redirects to /_node/stats/pipelines

Per pipeline API:

GET /_node/pipeline/:pipeline_id - redirects to /_node/pipelines/:pipeline_id GET /_node/stats/pipeline/:pipeline_id - redirects to /_node/stats/pipelines/:pipeline_id

New APIs:

GET /_node/pipelines - ??? GET /_node/pipelines/:pipeline_id - info on pipeline with this id GET /_node/stats/pipelines - overall stats across all pipelines (total number of events, reloads,etc) GET /_node/stats/pipelines/:pipeline_id - stats on pipeline by this id

jsvd commented 7 years ago

multiple pipeline api support has landed in master, but it's now necessary to add in the 5.x branch a deprecation path to this changes, therefore I'm leaving this issue open to track the 5.x bwc layer