fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.73k stars 1.56k forks source link

Support detailed memory usage metrics for input plugins (utilization of mem_buf_limit) #3022

Open PettitWesley opened 3 years ago

PettitWesley commented 3 years ago

Is your feature request related to a problem? Please describe.

Its hard to figure out the ideal mem_buf_limit for your scenario. It'd be nice if the tail plugin or all input plugins could emit metrics on their current resource usage. That way you can know which input needs more memory, and right-size the limit.

Its not just about configuring the mem_buf_limit, this would also just be useful in general to profile Fluent Bit. Some sort of per-plugin memory usage stats.

Describe the solution you'd like

Metrics in prometheus endpoint/format for the memory usage and limit for the input plugins.

Describe alternatives you've considered

Trial and error, checking logs for over limit warnings, which is not ideal, since you only see an warning when you are over. Best case is to know when you are getting close to the limit.

PettitWesley commented 3 years ago

I wonder if we add the plugin instance to all calls to flb_malloc and flb_free if we could easily track the allocations each plugin makes?

Or may be we should just emit metrics on certain buffers- the ones which are largest and most likely to contribute to OOMKills.

yongtang commented 3 years ago

I noticed that in fluent-bit all metrics are suffixed with _total for prometheus API: https://github.com/fluent/fluent-bit/blob/be8e07316cc8a9e9a351673dd50363cc7bcee426/src/http_server/api/v1/metrics.c#L344

Wondering if it might be cosmetically better to allow non-accumulating metrics to not append _total (also in issue #3036)

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

edsiper commented 3 years ago

enhancement.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 3 years ago

This issue was closed because it has been stalled for 5 days with no activity.