Open webmat opened 4 years ago
Note @tonymeehan that here we could simply add support for one more column, perhaps named "static_value".
Then valid lines would no longer be only the ones with both "source_field" and "destination_field":
source_field | static_value | destination_field | outcome |
---|---|---|---|
present | present | valid | |
present | present | valid | |
present | present | * | error |
present | skipped | ||
present | skipped | ||
present | skipped |
I like the suggestion. I'm thinking about two things.
First, should static_value
be on the right of destination_field
? In most cases, users will likely be mapping fields instead of setting static values, so it reads a bit easier I think if it's on the right.
I also think there's another error case where all three columns are present since it's ambiguous what to do.
source_field | destination_field | static_value | outcome |
---|---|---|---|
present | present | valid | |
present | present | valid | |
present | present | error | |
present | present | present | error |
present | skipped | ||
present | skipped | ||
present | skipped |
The second thing I'm thinking of is how to handle the static value. I'm thinking this could work:
source_field | destination_field | static_value | outcome |
---|---|---|---|
present | present | "static value" | valid |
present | present | [ "static value", "static value 2" ] | valid |
present | present | "static value | error |
present | present | [ "static value, "static value 2" ] | error |
present | present | [ , "static value 2" ] | error |
present | present | [ "static value", "static value 2" | error |
Well the order of the columns doesn't matter for the tool. Users are even free to have all of the columns they want, for additional notes of any kind. Only the KNOWN_CSV_HEADERS
are read.
The order we put the columns in the sample spreadsheet can still be adjusted for clarity. It's true that most lines will be meant to handle a source_field => destination_field conversion, and only very few are expected to hardcode.
But I think of the flow of data from left to right:
source_field => format_action => destination_field
And now
static_value => destination_field
So I thought these columns would make sense:
source_field, format_action, static_value, destination_field, copy_action
We can reinforce proper usage by improving the example section, in the example/
directory, too. Give a concrete example that takes all of this thinking into account
Looping back on this, I hadn't thought about capturing single values vs arrays of values, when users enter static values. Is this what you're describing with the square brackets and double quotes?
Here we'll need to find something that's really intuitive from the spreadsheet's POV. Then we'll need to look at how the major spreadsheets * manage the encoding to CSV. I could see them getting the details wrong, when we start adding quotes & stuff.
I'm tempted to say let's start with single values and not worry with arrays. Arrays are important for categorization with event.category
and event.type
. However I don't think ecs-mapper should support conditionals. And I think in most cases a given event stream will contain more than one event category, and different event types. In other words, I don't think users will be able to populate categorization fields properly, from this spreadsheet / CSV. This more fine grained identification of events will have to happen in their actual pipeline, not in this starter tool.
* Those I would consider: Excel, Google Docs, Apple Numbers
Some fields need to be hardcoded per source.
Note that since ecs-mapper doesn't support complex logic (no conditionals), I don't expect this to be used to populate all categorization fields. But it's still very common that a single source log will only ever map to one
event.type
or that we'll be able to hardcodeevent.dataset
orevent.module
with it.