elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.77k stars 24.69k forks source link

Honor default_pipeline on scripted _index rewrites #42019

Open peterpramb opened 5 years ago

peterpramb commented 5 years ago

Describe the feature:

Ingest pipelines allow for rewriting _index using the script processor (and possibly others), dynamically dispatching documents to different indices. Unfortunately, while index templates are considered for the new target index, a defined default_pipeline is not executed.

It would allow for more flexible pipeline chaining just to add new index templates when needed instead of updating the central ingest pipeline every time with new target pipelines. Another use case would be to specify only an additional pipeline for some indices and none for others.

It will be the responsibility of the user to prevent any circular loops, though.

peterpramb commented 5 years ago

Consider the following example:

Events are ingested to the index event-ingest, which has a default pipeline set in its index template. The pipeline examines event.type (which contains the originating software) and reroutes the event to the index event-<event.type>. That one might now have another default pipeline set in its index template when further processing is needed (and potentially another index rewrite), or none if no further processing is needed.

No need to update the central ingest pipeline and put a long list of conditional pipeline processors there, the flow is only controlled by index templates.

But as already mentioned - It will be the responsibility of the user to prevent any circular loops in such a setup.

peterpramb commented 5 years ago

That should really be service.type, sorry...

elasticmachine commented 5 years ago

Pinging @elastic/es-core-features

jakelandis commented 5 years ago

@peterpramb - I believe that this https://github.com/elastic/elasticsearch/pull/39607 (as of 6.7) addresses your request. Can you try your test case out on 6.7+ and if it still doesn't work can you provide a reproduction scenario ?

peterpramb commented 5 years ago

Unfortunately I'm at 6.7.1.

Here is a simple test case:

  1. Ingest pipeline
    • Pipeline
      PUT /_ingest/pipeline/testing-ingest-pipeline
      {
      "description": "Chained pipelines via index templates (#42019)",
      "processors": [
      {
          "append": {
              "field": "pipeline_set",
              "value": "ingest"
          }
      },
      {
          "script": {
              "lang": "painless",
              "source": "ctx._index = 'testing-index-chained';"
          }
      }
      ],
      "version": 20190512
      }
    • Template
      PUT /_template/testing-ingest-template
      {
      "index_patterns": [
      "testing-index-ingest"
      ],
      "version": 20190512,
      "order": 0,
      "settings": {
      "index": {
          "default_pipeline": "testing-ingest-pipeline"
      }
      }
      }
  2. Chained pipeline
    • Pipeline
      PUT /_ingest/pipeline/testing-chained-pipeline
      {
      "description": "Chained pipelines via index templates (#42019)",
      "processors": [
      {
          "append": {
              "field": "pipeline_set",
              "value": "chained"
          }
      },
      {
          "script": {
              "lang": "painless",
              "source": "ctx._index = 'testing-index-final';"
          }
      }
      ],
      "version": 20190512
      }
    • Template
      PUT /_template/testing-chained-template
      {
      "index_patterns": [
      "testing-index-chained"
      ],
      "version": 20190512,
      "order": 0,
      "settings": {
      "index": {
          "default_pipeline": "testing-chained-pipeline"
      }
      }
      }
  3. Testing
    • Ingest document
      POST /testing-index-ingest/_doc/
      {
      "field": "value"
      }
    • Result
      {
      "_id": "GWvEq2oBkBB05GCNyfzM",
      "_index": "testing-index-chained",
      "_primary_term": 1,
      "_seq_no": 0,
      "_shards": {
      "failed": 0,
      "successful": 2,
      "total": 2
      },
      "_type": "_doc",
      "_version": 1,
      "result": "created"
      }
    • Retrieve document
      GET /testing-index-chained/_doc/GWvEq2oBkBB05GCNyfzM
      {
      "_id": "GWvEq2oBkBB05GCNyfzM",
      "_index": "testing-index-chained",
      "_primary_term": 1,
      "_seq_no": 0,
      "_source": {
      "field": "value",
      "pipeline_set": [
          "ingest"
      ]
      },
      "_type": "_doc",
      "_version": 1,
      "found": true
      }
peterpramb commented 5 years ago

And this is the resulting index:

health status index                 uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   testing-index-chained Lou5CsZ3R7iImP9TF0qdYg   2   1          1            0      8.7kb          4.3kb

What should be testing-index-final instead.

peterpramb commented 5 years ago

The Elasticsearch version:

Version: 6.7.1, Build: default/tar/2f32220/2019-04-02T15:59:27.961366Z, JVM: 1.8.0_202
peterpramb commented 5 years ago

Still not working in 7.2.0, chained pipelines are simply ignored.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-data-management (Team:Data Management)