Closed bn-darindouglass-zz closed 3 years ago
Hi @DarinDouglass,
Keep in mind that it is possible to increase the total_fields
limit in Elasticsearch.
All built-in publishers support a custom transform
. I would say that this would be the preferred approach since you can control on a publisher basis how the data will be sent to the specific publisher.
for example you could even split the mbean sample in their own index
(require '[where.core :refer [where]])
(μ/start-publisher!
{:type :multi
:publishers
[{:type :console}
;; exclude mbean samples
{:type :elasticsearch :url "http://localhost:9200/"
:transform (partial filter (where :mulog/event-name not= :mulog/mbean-sampled))}
;; secondary index with only mbean samples
{:type :elasticsearch :url "http://localhost:9200/"
:index-pattern "'mulog-mbeans-'yyyy.MM.dd"
;; select only mbean sample and split by attribute
:transform #(->> %
(filter (where :mulog/event-name :is? :mulog/mbean-sampled))
(split-by-attribute)}]}))
Yeah I know you can increase the field limit in ES, however that's something we want to avoid, though it's definitely an option. We're not using the ES publisher (the ELK stack is being managed by another team and they don't expose ES directly) otherwise that'd work.
We make liberal use of transform
in our code (we have a wrapper around mulog
that adds tagging/filtering/etc automagically). I'd love to use transform
on the mbean publisher but when looking at solutions I noticed that the mbean transform
is applied differently than other publishers': mbean acts on a single map vs a seq of maps like other publishers.
If the behavior was the same I'd just be able to {:type :mbean :transform #(mapcat split! %)}
but that won't work as is. Is there any chance we could have the transform
applied here instead?
Generally, it is better to apply the transformation to the destination publisher rather than the source. For example, I would apply the transform on the ELS publisher rather than the sampler.
However, the issue you pointed out about the different signature for :transform
is my mistake.
Fix it, will cause a breaking change ;-( however I think it needs to be conformed to all the other transform functions.
On a deeper look, I see why it is this way. Let me explain.
The publishers transform functions have the following signature:
transform -> event-seq -> event-seq
The transform
function is a function that goes from a sequence-of-events to a modified sequence-of-events.
A μ/log's event is a map with the following structure:
{:mulog/event-name :your-ns/event-name,
:mulog/timestamp 1587501375129,
:mulog/trace-id #mulog/flake "4VTCYUcCs5KRbiRibgulnns3l6ZW_yxk",
:mulog/namespace "your-ns",
:your-custom-key "and-values"}
Samplers like mbean-sampler
use the publisher infrastructure to take samples at a regular interval.
However, when the sample is logged in μ/log (here) at this stage the event doesn't exist yet.
What you have here is a sample value that is not in the same shape as the event above.
the shape of the value transformed here (the sample) depends on the MBean being sampled, for example for the java.nio:*
might look like:
{:canonical-name "java.nio:name=direct,type=BufferPool",
:domain "java.nio",
:keys {"name" "direct", "type" "BufferPool"},
:attributes
{:Name "direct",
:Count 10,
:TotalCapacity 228226,
:MemoryUsed 228226,
:ObjectName "java.nio:name=direct,type=BufferPool"}}
Therefore it cannot expect the same transform event functions. The current transform
on mbean sampler is like:
;; currrently (0.7.1)
transform -> sample -> sample
But I can see how this can be confusing.
It is probably better to change the name of the function to avoid confusion.
My current thinking is to change it to: transform-samples
with the following signature:
;; future (0.8.0)
transform-samples -> sample-seq -> sample-seq
This type of transform will be applied to the sample value (not the event) prior the logging, thus offering the possibility to filter/transform/split at will.
What do you think, does this make sense to you?
Generally, it is better to apply the transformation to the destination publisher rather than the source. For example, I would apply the transform on the ELS publisher rather than the sampler.
That's an interesting take: for filtering transforms I think that makes sense (and it's what we do). For the type of operation I'm hoping to do (i.e. split mulog events into multiple events) requiring that transform on every other publisher feels a bit icky.
My current thinking is to change it to: transform-samples with the following signature:
I think this is fine. Will transform-samples
be the only way to transform samples (which I'm OK with since it's a superset of the currently implemented transform
code)?
Generally, it is better to apply the transformation to the destination publisher rather than the source.
That's an interesting take: for filtering transforms I think that makes sense (and it's what we do). For the type of operation I'm hoping to do (i.e. split mulog events into multiple events) requiring that transform on every other publisher feels a bit icky.
The reason why I would say it is better to apply the transformation at the destination publisher is that the transformation and target representation might be valid (or required) only for one specific system.
You shouldn't need to change a log event representation at the source only because one target system doesn't support something. For example, if the same event is sent to kinesis and then stored to S3, you might want to keep the full sample in that system.
However, the point of μ/log is to give you the possiblity to manage your metering data the way you want, and you know your particular use-case better than anyone else.
Will
transform-samples
be the only way to transform samples (which I'm OK with since it's a superset of the currently implemented transform code)?
Yes, the intention is to have a transform-samples
to apply a generic transformation to all the samples before the are logged,
and then use transform
to apply a generic transformation to all events before they are sent to their destination.
To summarise:
transform
: will be available in all built-in publishers (but not samplers)transform-samples
: will be available in all built-in samplers (but not publishers)Looks good to me. Thanks again!
Hi @DarinDouglass
I've made the following change:
https://github.com/BrunoBonacci/mulog/commit/662e3d0fee841af8fcc44353e723ccf1ad97aee3
and also added a deprecation warning for the old configuration, the change is described here:
https://github.com/BrunoBonacci/mulog/issues/75
If everything looks good I will release the version this weekend.
Released in: v0.8.0
(please see deprecation warning: https://github.com/BrunoBonacci/mulog/issues/75)
We are currently sending our logs to Kibana/Elasticsearch. Elasticsearch has a limit of 1000 fields per index. We found a good chunk of these fields are being taken up by mbean
:attributes
.Currently we're altering
mbeans-sample
to expand each mbean's:attributes
map into multiple logs each with a:attribute
field which should allow all of our mbean logs to take up a finite number of fields.This feels like a good config option for the mbeans sampler.
Thoughts?
Edit:
One possible solution is to make
transform
act on the result ofmbeans-sample
instead of the each of the samples returned individually. Off the top of my head, this publisher is the only one that behaves this way.