Open Mekk opened 1 year ago
Preferable solution:
a) json filter doesn't remove @metadata
when there are no @metadata.something
fields in unpacked record
b) even if unpacked record happens to contain @metadata.something
, json filter tries to merge those blocks (@metadata
is special, this is more-or-less set of local variables), or if this is too difficult, this override is avoided in some other way (for example by renaming unpacked @metadata
into sth else)
Minimal solution:
c) if the behaviour is to stay, there is clear warning in json filter docs that using this filter in toplevel mode (without target) removes @metadata
which must be preserved separately if needed.
is it possible that the json filter can call the ruby event api to_json_with_metadata
.
the @metadata
issue it causes a whole host of problems when using beats/agent too to kafka or logstash for that matter. I found a work around but will probably require changing your pipeline around as it did mine. I also love how ecs_compatibility duplicates message sizes because of message and event.original. Also the verbiage of everything is unfortunate.. as message
was one thing for years and was one thing to deal with data sources also used message
. but now event.original
which by definition would seem the original/raw, is in fact not. Is rather a variation usually of the "payload" of the data, ie: thing not like if it had added fields (agent, host, file, etc.). Then becomes really difficult to parse through what is original/raw when something was already in JSON and so forth and has fields added by agents and such.
Anyways, There are 6 options (assuming you're using logstash 8.x). I think only 1 solution for myself which is # 2 but you may be able to use # 3 or # 4. Hope the below table helps anybody else making the choice.. All of them have cons, but my necessity to be able to reduce ram/heap at hundreds of thousands of EPS, compatibility with ecs ingest pipelines and custom necessary parsers, and control the data and it's integrity and it's flow appropriately - i am left with option # 2.
number | simple code not specifying ecs_compatbility assuming logstash v8 | full code specifying ecs_compatbility assuming logstash v8 | codec | ecs_compatibility | target | original/raw kept? if applicable, what fields | JSON of original/raw? if applicable, what/where is it placed | keeps original/raw @metadata if applicable, how is it kept? | use ? | pro | con |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | codec => plain | codec => plain { ecs_compatibility => "v8" } | plain | v8 | n/a | message event.original | n/a | raw in message raw in event.original | - ability to control @metadata merging - ability to control raw - ability to control json - compatibility with certain ingest pipelines, as some think event.original is the raw json and some think is the message/log payload | - dup of raw (message & event.original), heap overhead of the log - requires manually controlling json - json_encode plugin is not installed automatically for logstash - or use ruby - manual merge @metadata - pia to manage message - pia to manage event.original | |
2 | n/a | codec => plain{ ecs_compatibility => "disabled" } | plain | disabled | n/a | message | n/a | raw in message | - no dup of raw, lower heap overhead - ability to control @metadata merging - ability to control raw - ability to control json - compatibility with certain ingest pipelines, as some think event.original is the raw json and some think is the message/log payload | - requires manually controlling json - via filters (however, json_encode plugin is not installed automatically for logstash) - json w/ target to temp_target - json_encode over temp_target - json w/ no target, remove temp_target (will finally merge to root) - via ruby, using to_hash_with_metadata - manual merge @metadata - pia to manage log with message contained within it | |
3 | codec => json | codec => json { ecs_compatibility => "v8" } | json | v8 | n/a | event.original | root | merged to root raw in event.original | - automatically merge @metadata - automatically controls json | - difficult/impossible to unwind/controll what was added to root if needed after the fact - changing event.original can be impossible to figure out what was from root and what was not - unable to control @metadata merging - difficulty with compatibility with certain ingest pipelines, as some think event.original is the raw json and some think is the message/log payload | |
4 | n/a | codec => json { ecs_compatibility => "v8" target => "nested_target" } | json | v8 | nested_target | event.original | nested_target | raw in nested_target raw in event.original | - ability to control @metadata merging - ability to control raw - ability to control json - compatibility with certain ingest pipelines, as some think event.original is the raw json and some think is the message/log payload | - requires manually controlling json - via filters (json_encode plugin is not installed automatically for logstash) - or use ruby - manual merge @metadata - if not needing nested target, as for some ingest pipelines, heap overhead of the log | |
5 | n/a | codec => json { ecs_compatibility => "disabled" target => "nested_target" } | json | disabled | nested_target | n/a | nested_target | raw in nested_target | can't use | can't use | |
6 | n/a | codec => json { ecs_compatibility => "disabled" } | json | disabled | n/a | n/a | root | merged to root | can't use | can't use |
to add to this, ingest pipelines sometimes expect event.original and sometimes expect message. sometimes copy message to event.original and remove message. sometimes copy event.original to message but still use message.
Problem I faced in rather simple setup:
filebeat scraps some apache and nginx logs (using standard modules), output-ting them to kafka
logstash reads from kafka, unpacks json, makes some (irrelevant here) changes and saves to elasticsearch
It turned out, that in spite of using
decorate_events => "basic"
,[@metadata][kafka]
is not available. And it took me quite a lot of time to find out - why. Looks likejson
filter, while unpacking, removed@metadata
block.Relevant parts of the config (my actual config is more complicated but other elements are irrelevant here):
I expected to see
kafka.topic
in the results, but this field was simply missing.It turned out that replacing first filter with
helped¹, so looks like json filter removed
@metadata
for some reason.This reason is even more unclear as it seems to me that no fields appeared under
@metadata
(so it is not even the case of „there was@metadata.something
in filebeat output and this is why json replaced this block) - at least it seems so to me after some glazing at rubydebug output..Once the problem is known, it is rather easy to workaround (for example as above) but for unaware it is very confusing.
¹ Of course swapping filter order would likely help too but in my case actual processing of kafka metadata was different and needed both kafka metadata and unpacked fields.