Closed martijnvg closed 8 years ago
Very interesting feature would use this
When will this plugin ready to use? Sounds very interesting!
Will the plugin be usabel with elastic 1.x?
hi @marcelhallmann we don't have a date nor a targeted version for now, but you can monitor the progress in this meta issue. We will have a first release whenever all of the needed features for the first phase are in. We are developing against master (3.x) and considering backporting to 2.x. We will not backport to 1.x though.
As many, I am looking forward to this. On this Ingest Node feature, how does one contribute with custom filters? We plan on developing one and it would be great if it could be linked to Ingest Node, without us needing to develop it for Logstash.
Update on this?
@McStork @timini If you're interested in writing your own processor you could take a look at the geoip
processor which has been developed as a plugin:
https://github.com/elastic/elasticsearch/tree/master/plugins/ingest-geoip
Beware that this is unreleased code and that things may change. Also it is unknown when ingest gets released.
@martijnvg Thanks!
I'm looking forward to this feature to ingest json and replace fluentd
. :+1:
This looks fantastic! Which branch did this end up in? I'm getting a 404 on https://github.com/elastic/elasticsearch/tree/feature/ingest (linked to above, just prior to the checklist)
@ryanmaclean the branch was merged to master and deleted. Ingest node and all of its processors will be released with the next major release (namely 5.0).
So this is available in master now?
Any documentation out there?
The first version of the docs is published as part of our reference: https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html .
hi @luto65 I feel like this discussion would be better suited for our discuss forums. Would you mind posting your questions there? Then if the result of the discussion is a feature request, or a bug, a new issue can be opened on github. Thanks!
Closing this issue, all required tasks for phase 1 are completed.
[currently open issues]
related issues from other projects:
There are many use-cases where it is important to enrich incoming data. This enrichment may be something simple like using a regular expression to extract metadata from an existing field, or something more advanced like a geoip lookup or language identification. The filter stage of the Logstash processing pipeline provides great examples of the ways in which data is often enriched. Node ingest implements a new type of ES node, which performs this enrichment prior to indexing.
Node ingest is a pure Java implementation of the filters in logstash, integrated with Elasticsearch. It works by wrapping the bulk/index APIs, executing a pipeline that is composed of multiple processors to enrich the documents. A processor is just a component that can modify incoming documents (the source before ES turns it into a document). A pipeline is a list of processors grouped under an unique id. If node ingest is enabled then the index and bulk apis can reroute the the request with documents through a pipeline.
The ingest plugin runs on dedicated client nodes and after bulk and index requests have been enriched these index and bulk request continue their way into the cluster.
Node ingest will be a plugin in the elasticsearch project, implementing 2 main aspects:
The first is a pure Java implementation for Pipeline, Processor, as well as initial processor implementation of grok, geoip, kv/mutate, date. This java implementation can then be reused in others places, such as logstash itself, reindex API, and so on. In the first version of the ingest plugin the processor implementations can reside in the ingest plugin, but the framework and processor implementations shouldn’t rely on any ES specific code, so that later on it can be moved to an isolated library.
The second part is the integration with Elasticsearch. This includes interception of the bulk/index APIs, management APIs (stats and so on in future phase), storage and live reload of the configuration, supporting multiple "live" pipelines, and simulation of pipeline execution.
The goal of the ingest plugin is to make data enrichment easier and it will not replace logstash at all. The ingest plugin should make data enrichment in most of the cases easier when events are only stored in Elasticsearch. For example when only file beat is used to ship logs, a logstash instance will no longer be required. In cases where events are stored in multiple outputs a Logstash installation is required. Also at some point Logstash will reuse the pipeline/processor framework, so the end goal is that both Elasticsearch and Logstash will benefit from the ingest initiative.
Development happens in a feature branch: https://github.com/elastic/elasticsearch/tree/feature/ingest
Current node ingest tasks:
pipeline_id
parameter is available to select what pipeline should be used to preprocess the documents before the index/bulk APIs get executed. #13941node.ingest
set to false receives an ingest request, it should explicitly fail. (mvg) #15610.ingest
index has been started. (mvg) #15203pipeline_id
param name topipeline
. #15618tag
s toon_failure
metadata #16202possible v2 tasks:
on_failure
processors to receive a document with a pre-failed-processor state (ref: https://github.com/elastic/elasticsearch/issues/14548#issuecomment-161799133)