elastic / ecs-mapper

Translate an ECS mapping CSV to starter pipelines for Beats, Elasticsearch or Logstash
Apache License 2.0
54 stars 12 forks source link

Add the ability to hardcode values #5

Open webmat opened 4 years ago

webmat commented 4 years ago

Some fields need to be hardcoded per source.

Note that since ecs-mapper doesn't support complex logic (no conditionals), I don't expect this to be used to populate all categorization fields. But it's still very common that a single source log will only ever map to one event.type or that we'll be able to hardcode event.dataset or event.module with it.

webmat commented 4 years ago

Note @tonymeehan that here we could simply add support for one more column, perhaps named "static_value".

Then valid lines would no longer be only the ones with both "source_field" and "destination_field":

source_field static_value destination_field outcome
present present valid
present present valid
present present * error
present skipped
present skipped
present skipped
tonymeehan commented 4 years ago

I like the suggestion. I'm thinking about two things.

First, should static_value be on the right of destination_field? In most cases, users will likely be mapping fields instead of setting static values, so it reads a bit easier I think if it's on the right.

I also think there's another error case where all three columns are present since it's ambiguous what to do.

source_field destination_field static_value outcome
present present valid
present present valid
present present error
present present present error
present skipped
present skipped
present skipped

The second thing I'm thinking of is how to handle the static value. I'm thinking this could work:

source_field destination_field static_value outcome
present present "static value" valid
present present [ "static value", "static value 2" ] valid
present present "static value error
present present [ "static value, "static value 2" ] error
present present [ , "static value 2" ] error
present present [ "static value", "static value 2" error
webmat commented 4 years ago

Well the order of the columns doesn't matter for the tool. Users are even free to have all of the columns they want, for additional notes of any kind. Only the KNOWN_CSV_HEADERS are read.

The order we put the columns in the sample spreadsheet can still be adjusted for clarity. It's true that most lines will be meant to handle a source_field => destination_field conversion, and only very few are expected to hardcode.

But I think of the flow of data from left to right:

source_field => format_action => destination_field

And now

static_value => destination_field

So I thought these columns would make sense:

source_field, format_action, static_value, destination_field, copy_action

We can reinforce proper usage by improving the example section, in the example/ directory, too. Give a concrete example that takes all of this thinking into account

webmat commented 4 years ago

Looping back on this, I hadn't thought about capturing single values vs arrays of values, when users enter static values. Is this what you're describing with the square brackets and double quotes?

Here we'll need to find something that's really intuitive from the spreadsheet's POV. Then we'll need to look at how the major spreadsheets * manage the encoding to CSV. I could see them getting the details wrong, when we start adding quotes & stuff.

I'm tempted to say let's start with single values and not worry with arrays. Arrays are important for categorization with event.category and event.type. However I don't think ecs-mapper should support conditionals. And I think in most cases a given event stream will contain more than one event category, and different event types. In other words, I don't think users will be able to populate categorization fields properly, from this spreadsheet / CSV. This more fine grained identification of events will have to happen in their actual pipeline, not in this starter tool.

* Those I would consider: Excel, Google Docs, Apple Numbers