fluent / fluent-plugin-prometheus

A fluent plugin that collects metrics and exposes for Prometheus.
Apache License 2.0
257 stars 80 forks source link

Add "retention" feature allowing idle metrics to expire #204

Open phihos opened 2 years ago

phihos commented 2 years ago

As proposed in #20 I implemented a way to auto remove idle metrics at runtime without needing to restart. Example:

<metric>
  name message_foo_counter
  type counter
  desc The total number of foo in message.
  key foo
  retention 3600 # 1h
  retention_check_interval 1800 # 30m
  <labels>
    bar ${bar}
  </labels>
</metric>

If ${bar} was baz one time but after that no records with that value were processed, then after one hour the metric foo{bar="baz"} might be removed. When this actually happens depends on retention_check_interval (default 60). It causes a background thread to check every 30 minutes for expired metrics. So worst case the metrics are removed 30 minutes after expiration.

The naming of the config keys were shamelessly stolen from inspired by grok_exporter to make this feature more familiar to people using the grok_exporter.

Additional to the implementation I had to refactor the Metrics class to directly implement instrument(record, expander) and put subclass-specific logic into value(record), set_value?(value) and set_value(value, labels). That reduces code duplication and was necessary for not introducing further duplicates.

I also had to introduce a new data store based on the default data store of prometheus/client_ruby to allow for removal of elements.

The last thing I want to mention is that I need to use the thread helper to start cleaning expired metrics in the background. I first tried to use the timer helper but it caused a test to go into an infinite loop.

If you need more tests or need other alterations to the code please let me know. I am looking forward to your feedback 🙂

phihos commented 1 year ago

Sorry I had two very busy weeks. I hope I get to it tomorrow.

gromnsk commented 1 year ago

@phihos any update?

AlbusLumos commented 1 year ago

@phihos is this still on-going?

dkulchinsky commented 1 year ago

Hey @phihos 👋🏼 wondering if you still have cycles to work on this? I can try and take over though not as experienced with Ruby.

Looks like there was mainly a concern whether a dedicated thread is needed, instead of running it in the plugin thread.

Lusitaniae commented 6 months ago

Wonder if you had the bandwidth to pick this up again @phihos

:pray: