Open m-adams opened 4 years ago
Pinging @elastic/es-core-features (:Core/Features/Ingest)
If it helps, this was my use case for needing this feature https://discuss.elastic.co/t/split-json-array-into-multiple-events-using-ingest-pipelines/238519
This would be really helpful. I'm sending in batches of metrics from a source where it is difficult to change the format to have them sent in one at a time or in bulk format. If I could split the array of json measurements up with an ingest pipeline I could avoid having to put in an additional parsing layer.
example part of an ingested doc. I want to split out the json measurements in data
into individual documents
{
"_index" : "metrics_test",
"_type" : "_doc",
"_id" : "IsWOhHcBCOjOntNtCq76",
"_score" : 1.0,
"_source" : {
"data" : [
{
"type" : "measure1",
"date" : "2021-02-08T00:44:32-06:00",
"value" : "164",
"unit" : "count"
},
{
"type" : "measure1",
"date" : "2021-02-08T00:55:16-06:00",
"value" : "22",
"unit" : "count"
},
...
I could also use this. I'm trying to cut out logstash and can't without this!
this is very much needed! +1
this is very much needed! +1
this is very much needed! +1
this is very much needed! +1
this is very much needed! +1
This is one of the final remaining items preventing us from decommissioning our Logstash instances and fully migrating to beats + Ingest Pipeline. We have multiple data sources that include arrays in the JSON data that need to be split into their own documents while potentially inheriting some properties from the parent document.
For example, input:
{
"user_id": "abc123",
"time": "1994-11-05T13:15:30Z",
"events": [
{
"event_name": "view_page",
"event_metadata": "blah"
},
{
"event_name": "click_submit",
"event_metadata": "blah"
}
]
}
output:
{
"user_id": "abc123",
"time": "1994-11-05T13:15:30Z",
"event": {
"event_name": "view_page",
"event_metadata": "blah"
}
}
{
"user_id": "abc123",
"time": "1994-11-05T13:15:30Z",
"event": {
"event_name": "click_submit",
"event_metadata": "blah"
}
}
Is there any confirmation that this will be realized? I can see this has been added to the enhancements and 'needs-triage'. Is there any information on where this is in the works?
+1
We have the same requirements from our customer.
We are enriching our data with the enrich
processor that may match multiple documents.
We need to index the same document + enriched fields into multiple documents if the enrichment matched multiple documents.
We have the same issue - the split processor httpjson inputof the filebeat threatintel module does not work properly for our use case (get Attributes from events in MISP as documents).
response.split:
target: body.response
split:
target: body.Event.Object
split:
target: body.Event.Object.Attribute
still leaves us with
{
"Attribute": [
{
"category": "Network activity",
"deleted": false,
"to_ids": true,
"value": "https://redacted.net/ls/click?upn=5c-2BN7OI7J"
},
{
"category": "Network activity",
"to_ids": true,
"type": "domain",
"uuid": "76bfee8d-4d2f-4aee-aba6-ab714b1e65ab",
"value": "redacted.net"
}
],
"ObjectReference": [
{
"Object": {
"distribution": "5"
},
"comment": "",
"deleted": false,
"uuid": "895b6048-1bb1-4f6a-bdb4-cf7fb45f4fcc"
}
],
"comment": "Redirector URL contained in mail",
"event_id": "3835"
}
This could be resolved through splitting docs with an ingest pipeline.
Given the Processor interface's execute()
definition, it looks like it would be impossible to implement a split without substantial changes.
That is a great shame!
I am also eager to have it.
+1
I too would love to see this feature
+1
Pretty important feature!
+1
+1 This would be really helpful for availability usecases where a room with multiple availability dates and pricing comes within a single document. This would be a real timesaver if it is going to made.
+1
+1
+1
+1
+1
Any update? It has been 2 years @elastic/es-core-features (:Core/Features/Ingest)
+1
+1 this is needed. We often query APIs to multiple products and it always return results in an array format. Need to split the results into multiple documents instead. Creating different API requests to call for specific object is not practical in my use case.
+1 This is needed!
+1 lack of this feature forces an unnecessary trip to logstash
Depending on your use use case, Filebeat has a processor which
https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-httpjson.html#response-split
which may be relevant as it allows you to:
"...convert a map, array, or string into multiple events..."
and optionally
"...fields from the parent document (at the same level as target) will be kept..."
+1 it will help us avoid using elastic-serverless-forwarder with expand_event_list_from_field
+1 Pretty important feature!
+1 would love this!
+1
+1
+1
+1
+1
👍 We need this for blue-green sharding
This would be very useful to split text body into chunks to overcome the 512 tokens limit of the embeddings models
+1
Please add this feature; I'll solve use cases with feature enabled on ingest pipeline. Thanks a lot
This would be very useful to split text body into chunks to overcome the 512 tokens limit of the embedding models
This would be a veeeeery helpful feature. I have the need to split an array in one document to several documents.
It's been 3 years @elastic How many more +1s are needed?
+1
+1
It is common for tools to output data in a combined format where one document may contain several entities. For example, a tool that scans several hosts for compliance or vulnerabilities or an API that provides an update to every train/bus etc. We really want to split all entities out to separate docs while copying some high-level information. This is possible using Logstash and the Split filter but not possible with Ingest Pipelines.
The feature would allow this kind of document to be processed and split without having to include Logstash in the ingest chain.