add config for expanding mbean attributes into individual logs

bn-darindouglass-zz commented 3 years ago

We are currently sending our logs to Kibana/Elasticsearch. Elasticsearch has a limit of 1000 fields per index. We found a good chunk of these fields are being taken up by mbean :attributes.

Currently we're altering mbeans-sample to expand each mbean's :attributes map into multiple logs each with a :attribute field which should allow all of our mbean logs to take up a finite number of fields.

This feels like a good config option for the mbeans sampler.

Thoughts?

Edit:

One possible solution is to make transform act on the result of mbeans-sample instead of the each of the samples returned individually. Off the top of my head, this publisher is the only one that behaves this way.

BrunoBonacci commented 3 years ago

Hi @DarinDouglass,

Keep in mind that it is possible to increase the total_fields limit in Elasticsearch.

All built-in publishers support a custom transform. I would say that this would be the preferred approach since you can control on a publisher basis how the data will be sent to the specific publisher.

for example you could even split the mbean sample in their own index

(require '[where.core :refer [where]])

(μ/start-publisher!
 {:type :multi
  :publishers
  [{:type :console}

    ;; exclude mbean samples
    {:type :elasticsearch :url  "http://localhost:9200/" 
     :transform (partial filter (where :mulog/event-name not= :mulog/mbean-sampled))}

    ;; secondary index with only mbean samples
    {:type :elasticsearch :url  "http://localhost:9200/" 
     :index-pattern "'mulog-mbeans-'yyyy.MM.dd"
     ;; select only mbean sample and split by attribute
     :transform #(->> % 
                   (filter (where :mulog/event-name :is? :mulog/mbean-sampled))
                   (split-by-attribute)}]}))

bn-darindouglass-zz commented 3 years ago

Yeah I know you can increase the field limit in ES, however that's something we want to avoid, though it's definitely an option. We're not using the ES publisher (the ELK stack is being managed by another team and they don't expose ES directly) otherwise that'd work.

We make liberal use of transform in our code (we have a wrapper around mulog that adds tagging/filtering/etc automagically). I'd love to use transform on the mbean publisher but when looking at solutions I noticed that the mbean transform is applied differently than other publishers': mbean acts on a single map vs a seq of maps like other publishers.

If the behavior was the same I'd just be able to {:type :mbean :transform #(mapcat split! %)} but that won't work as is. Is there any chance we could have the transform applied here instead?

BrunoBonacci commented 3 years ago

Generally, it is better to apply the transformation to the destination publisher rather than the source. For example, I would apply the transform on the ELS publisher rather than the sampler.

However, the issue you pointed out about the different signature for :transform is my mistake. Fix it, will cause a breaking change ;-( however I think it needs to be conformed to all the other transform functions.

BrunoBonacci commented 3 years ago

On a deeper look, I see why it is this way. Let me explain.

The publishers transform functions have the following signature:

transform -> event-seq -> event-seq

The transform function is a function that goes from a sequence-of-events to a modified sequence-of-events. A μ/log's event is a map with the following structure:

{:mulog/event-name :your-ns/event-name,
 :mulog/timestamp 1587501375129,
 :mulog/trace-id #mulog/flake "4VTCYUcCs5KRbiRibgulnns3l6ZW_yxk",
 :mulog/namespace "your-ns",
 :your-custom-key "and-values"}

Samplers like mbean-sampler use the publisher infrastructure to take samples at a regular interval. However, when the sample is logged in μ/log (here) at this stage the event doesn't exist yet. What you have here is a sample value that is not in the same shape as the event above.

the shape of the value transformed here (the sample) depends on the MBean being sampled, for example for the java.nio:* might look like:

{:canonical-name "java.nio:name=direct,type=BufferPool",
  :domain "java.nio",
  :keys {"name" "direct", "type" "BufferPool"},
  :attributes
  {:Name "direct",
   :Count 10,
   :TotalCapacity 228226,
   :MemoryUsed 228226,
   :ObjectName "java.nio:name=direct,type=BufferPool"}}

Therefore it cannot expect the same transform event functions. The current transform on mbean sampler is like:

;; currrently (0.7.1)
transform -> sample -> sample

But I can see how this can be confusing. It is probably better to change the name of the function to avoid confusion. My current thinking is to change it to: transform-samples with the following signature:

;; future (0.8.0)
transform-samples -> sample-seq -> sample-seq

This type of transform will be applied to the sample value (not the event) prior the logging, thus offering the possibility to filter/transform/split at will.

What do you think, does this make sense to you?

bn-darindouglass-zz commented 3 years ago

Generally, it is better to apply the transformation to the destination publisher rather than the source. For example, I would apply the transform on the ELS publisher rather than the sampler.

That's an interesting take: for filtering transforms I think that makes sense (and it's what we do). For the type of operation I'm hoping to do (i.e. split mulog events into multiple events) requiring that transform on every other publisher feels a bit icky.

My current thinking is to change it to: transform-samples with the following signature:

I think this is fine. Will transform-samples be the only way to transform samples (which I'm OK with since it's a superset of the currently implemented transform code)?

BrunoBonacci commented 3 years ago

Generally, it is better to apply the transformation to the destination publisher rather than the source.

That's an interesting take: for filtering transforms I think that makes sense (and it's what we do). For the type of operation I'm hoping to do (i.e. split mulog events into multiple events) requiring that transform on every other publisher feels a bit icky.

The reason why I would say it is better to apply the transformation at the destination publisher is that the transformation and target representation might be valid (or required) only for one specific system.

You shouldn't need to change a log event representation at the source only because one target system doesn't support something. For example, if the same event is sent to kinesis and then stored to S3, you might want to keep the full sample in that system.

However, the point of μ/log is to give you the possiblity to manage your metering data the way you want, and you know your particular use-case better than anyone else.

Will transform-samples be the only way to transform samples (which I'm OK with since it's a superset of the currently implemented transform code)?

Yes, the intention is to have a transform-samples to apply a generic transformation to all the samples before the are logged, and then use transform to apply a generic transformation to all events before they are sent to their destination.

To summarise:

transform: will be available in all built-in publishers (but not samplers)
transform-samples: will be available in all built-in samplers (but not publishers)

bn-darindouglass-zz commented 3 years ago

Looks good to me. Thanks again!

BrunoBonacci commented 3 years ago

Hi @DarinDouglass

I've made the following change:

https://github.com/BrunoBonacci/mulog/commit/662e3d0fee841af8fcc44353e723ccf1ad97aee3

and also added a deprecation warning for the old configuration, the change is described here:

https://github.com/BrunoBonacci/mulog/issues/75

If everything looks good I will release the version this weekend.

BrunoBonacci commented 3 years ago

Released in: v0.8.0 (please see deprecation warning: https://github.com/BrunoBonacci/mulog/issues/75)

BrunoBonacci / mulog

add config for expanding mbean attributes into individual logs #72