elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.21k stars 3.5k forks source link

New Plugin: filter-normalize #9768

Open webmat opened 6 years ago

webmat commented 6 years ago

Purpose

source target conversion
type event.type
duration event.duration to_float
jakelandis commented 6 years ago

@webmat thanks for writing this up !

+1 to purpose and features. However, I think we should support inline configuration in addition to external configuration. (mostly for better centralized config management)

Mapping file would likely be JSON

I think we should target CVS since the ECS schema is in CSV, JSON is hard to hand code and I think regardless of ECS or not, it will be row/column oriented data. Perhaps (using Kibana dot syntax):

source target type
type event.type string

For ECS, we could offer a downloadable CSV file with all the targets pre-populated with the ECS names / types. This would allow a user to look at data in Kibina and then use Excel to create the mapping file.

Does the name "normalize" sound appropriate?

I like it

What should be the behaviour if the destination field already exists on the event?

A tag on the event as _normalize_failure and leave the original event untouched. It only impacts "Perform normalization on the event by adding new fields, leaving original fields untouched" ... leaving the original untouched has some nice guarantees.

guyboertje commented 6 years ago

@webmat As this is a proposal and Beats and/or IngestNode might do the same, you should create a Google Doc along the lines of Beats Central Management Proposal in GDrive > Engineering [Internal] > Ingest > Proposals. The logic IIRC is to allow suggestions to be sited appropriately and further discussion on the point or someone else's suggestion are co-located. The doc is also more alive with version history. People can also add alternative ideas to the end of the doc. When consensus is reached transfer the concluded proposal here in case the community wishes to comment.

GH issues are a poor alternative.

In retrospect, my discuss issue on the math filter should have been such a doc.

webmat commented 6 years ago

@guyboertje I see what you mean. However I get the feeling that if I go the GDrive way, instead of a quick 1 week discussion, it could linger for much longer. The proposal here is pretty straightforward. I'll think about it, but I'm not 100% convinced yet.

webmat commented 6 years ago

@jakelandis Thanks for the feedback!

However, I think we should support inline configuration in addition to external configuration. (mostly for better centralized config management)

The inline configuration aspect is interesting. But if we end up supporting inline configuration, we're more or less re-implementing mutate { rename => {} }, no?

webmat commented 6 years ago

@jakelandis I can go with CSV, perhaps. I agree it's much simpler to write.

I don't necessarily think offering a downloadable CSV for ECS makes sense. Perhaps it does. But when I say "limited amount of coercion", what I mean is not in line with what one would find in an ECS csv file. In ECS you'll have fields that are "text" and others that are "keyword". This wouldn't fit in this mapping file. We don't want users to think they can enforce their mapping from here :-)

What I have in mind is rather a place where you can drop in all of your simple coercions:

* And this 0 messing up the day's mapping if that's the first event of the day

I'd probably keep it at that for a start. I don't want to get into date/IP/geo here. We have full fledged plugins for those.

webmat commented 6 years ago

Also, this column would be optional (can be left empty). In most cases the field is already in the format desired.

webmat commented 6 years ago

Yeah the "_normalize_failure" tag makes sense on failures. I would likely try to do as many of the conversions as possible, instead of stopping at the first failure, though.