Add the ability to hardcode values

elastic / ecs-mapper

Translate an ECS mapping CSV to starter pipelines for Beats, Elasticsearch or Logstash

Apache License 2.0

54 stars 12 forks source link

Add the ability to hardcode values #5

Open webmat opened 4 years ago

webmat commented 4 years ago

Some fields need to be hardcoded per source.

Note that since ecs-mapper doesn't support complex logic (no conditionals), I don't expect this to be used to populate all categorization fields. But it's still very common that a single source log will only ever map to one event.type or that we'll be able to hardcode event.dataset or event.module with it.

webmat commented 4 years ago

Note @tonymeehan that here we could simply add support for one more column, perhaps named "static_value".

Then valid lines would no longer be only the ones with both "source_field" and "destination_field":

source_field	static_value	destination_field	outcome
present		present	valid
	present	present	valid
present	present	*	error
present			skipped
	present		skipped
		present	skipped

tonymeehan commented 4 years ago

I like the suggestion. I'm thinking about two things.

First, should static_value be on the right of destination_field? In most cases, users will likely be mapping fields instead of setting static values, so it reads a bit easier I think if it's on the right.

I also think there's another error case where all three columns are present since it's ambiguous what to do.

source_field	destination_field	static_value	outcome
present	present		valid
	present	present	valid
present		present	error
present	present	present	error
present			skipped
	present		skipped
		present	skipped

The second thing I'm thinking of is how to handle the static value. I'm thinking this could work:

source_field	destination_field	static_value	outcome
present	present	"static value"	valid
present	present	[ "static value", "static value 2" ]	valid
present	present	"static value	error
present	present	[ "static value, "static value 2" ]	error
present	present	[ , "static value 2" ]	error
present	present	[ "static value", "static value 2"	error

webmat commented 4 years ago

Well the order of the columns doesn't matter for the tool. Users are even free to have all of the columns they want, for additional notes of any kind. Only the KNOWN_CSV_HEADERS are read.

The order we put the columns in the sample spreadsheet can still be adjusted for clarity. It's true that most lines will be meant to handle a source_field => destination_field conversion, and only very few are expected to hardcode.

But I think of the flow of data from left to right:

source_field => format_action => destination_field

And now

static_value => destination_field

So I thought these columns would make sense:

source_field, format_action, static_value, destination_field, copy_action

We can reinforce proper usage by improving the example section, in the example/ directory, too. Give a concrete example that takes all of this thinking into account

webmat commented 4 years ago

Looping back on this, I hadn't thought about capturing single values vs arrays of values, when users enter static values. Is this what you're describing with the square brackets and double quotes?

Here we'll need to find something that's really intuitive from the spreadsheet's POV. Then we'll need to look at how the major spreadsheets * manage the encoding to CSV. I could see them getting the details wrong, when we start adding quotes & stuff.

I'm tempted to say let's start with single values and not worry with arrays. Arrays are important for categorization with event.category and event.type. However I don't think ecs-mapper should support conditionals. And I think in most cases a given event stream will contain more than one event category, and different event types. In other words, I don't think users will be able to populate categorization fields properly, from this spreadsheet / CSV. This more fine grained identification of events will have to happen in their actual pipeline, not in this starter tool.

* Those I would consider: Excel, Google Docs, Apple Numbers