fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.77k stars 1.57k forks source link

Handle large number of limited files #8470

Open Jesperbk opened 7 months ago

Jesperbk commented 7 months ago

Is your feature request related to a problem? Please describe. In my project we have a component taking messages off a queue. Before being processed, each message is backed up to a separate JSON file (i.e. one message per file). I would like to read each of these files with Fluent Bit, and pass them on to my log framework. However, the number of files has proved problematic for Fluent Bit. The initial number of files that needs to be read when Fluent Bit is started may be in the thousands, and Fluent Bit almost immediately crashes with a "Too many open files" error. I have of course set the "Exit_On_Eof" setting on my tail, but it seems Fluent Bit is opening new handles faster than it closes them.

Describe the solution you'd like Filebeat supports an option, "harvester_limit", that limits how many consumers (i.e. file handlers) can be used for a particular input. When combined with "Exit_On_Eof", consuming large numbers of small files worked perfectly. I would like Fluent Bit to have something similar, or at least the ability to throttle itself before crashing.

Describe alternatives you've considered I haven't been able to figure out a really viable solution. The best I have come up with is to write my own tool to read the files, and then pass it on to Fluent Bit for the remainder of the pipeline, but ensuring that it runs stably in all circumstances is not worth the effort.

Let me know if you need more details. And thanks in advance!

patrick-stephens commented 7 months ago

FYI, exit_on_eof will mean FB itself exits when any file hits EOF regardless of what the others are doing which might not be what you want (e.g. if you want it to finish processing open files - but not open more - then this is not guaranteed to happen as it will just exit asap).

Jesperbk commented 7 months ago

@patrick-stephens Thanks, I guess that makes sense. That unfortunately makes Fluent Bit even less applicable to my use case. I might do some experimenting without that setting then.

patrick-stephens commented 7 months ago

An alternative might be to use the exec plugin (if not in a container) or a LUA filter with a side effect that reads input files (you can use dummy as a timer and just discard the dummy input to be replaced potentially).

That way you could script the chunking of files however you want, along with things like moving them afterwards or whatever.

Jesperbk commented 7 months ago

I am considering it, but the problem is my external script would have to keep track of its own state, i.e. which of the JSON files have been read, and can be skipped on the next pass. This logic already exists in Fluent Bit, and I would like to use that, rather than taking on the effort and risk of coding it myself. There is likely going to pop up all sorts of issues that I would need to deal with, e.g. performance issues, memory consumption, filesystem issues, or abrupt termination. I would really like to delegate this to Fluent Bit, which already has tried-and-true functionality for tracking state.

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

Jesperbk commented 4 months ago

Bump, this is still an issue for me.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

Jesperbk commented 1 month ago

Bump, this is still an issue for me.