axoflow / axosyslog

AxoSyslog - the scalable security data processor
https://axoflow.com
Other
45 stars 13 forks source link

filterx: remove fields with empty values #226

Open jszigetvari opened 1 month ago

jszigetvari commented 1 month ago

ability to remove certain fields with empty values:

jszigetvari commented 1 month ago

We should also add different modes of operations, or separate the functionality to two functions:

MrAnno commented 1 month ago

A more general solution would be implementing group_unset() for filterx (probably with a better name), where lists and regexp patterns can be specified. I think it's not a much bigger task, so we should do this instead.

For AxoRouter, we can always add an axo-remove-null-values() SCL that parameterizes a group_set call with the known null-values.

alltilla commented 1 month ago

I would go even further in the generalization, with something like this:

I think this architecture would scale really good with our parsers and transformation logics.

jszigetvari commented 1 month ago

@MrAnno

For AxoRouter, we can always add an axo-remove-null-values() SCL that parameterizes a group_set call with the known null-values.

Well, in that regard we would need something where we could specify somehow (maybe different functions, or through a parameter) whether we need to actually unset the attributes or set them to emty string. (This is important because of CSV-like data, where the order of values is important.) On top of that, perhaps there should be a default set of known null/empty values, which the user could extend or override (through an argument) if necessary.

MrAnno commented 1 month ago

I don't fully see how this could scale well if we consider our original decision of not allowing to define custom functions/callbacks in filterx. This means we will have to implement a set of transformation functions and also allow them to be passed to almost all of our functions that "create" new sets of data.

Coupling these transformations with the functions that create new values seems unnecessary to me in such a language where we have functional-style building blocks and where were are closed to in-language extensions (defining our own functions).

It seems cleaner to me to provide general enough transformation functions that can work on their own.

that can only be passed to some other functions, but not called on their own

This was my first trigger to think that coupling transformations with the "source of creation" may not be the best idea (both from the user's and the C implementation's perspective).

alltilla commented 1 month ago

Sure we can do everything by having separate functions for iterating through the dict and modifying its values, but it quickly gets resource intensive if we do it multiple times for the necessary transformations. I thought we could optimize it, but that might introduce some complications in our implementation.

MrAnno commented 1 month ago

Let's measure some filterx performance in real-world use cases, identify the bottlenecks, and do perf-related optimizations on places we are sure are worth the complexity compared to the actual numbers.

I'm not against adding some complexity when the end result is cleaner for the user, but in this case, this coupling seems comfortable, but it's not that clean if we think about it.