amir20 / dozzle

Realtime log viewer for docker containers.
https://dozzle.dev/
MIT License
5.77k stars 289 forks source link

lightweight alerting #3288

Open tcurdt opened 12 hours ago

tcurdt commented 12 hours ago

Describe the feature you would like to see

I would like to be able to monitor the logs for keywords and expose that information as a prometheus metric.

Describe how you would like to see this feature implemented

Usually one looks at live logs after something happened. For some things this means some keyword are found in the logs. I get that this isn't the focus of doozle, but maybe it would be possible to define some monitoring rules and just expose them as prometheus metrics. That skips a lot of the notification complexities and still allows to integrate some alerting.

Describe any alternatives you've considered

Loki

amir20 commented 8 hours ago

but maybe it would be possible to define some monitoring rules and just expose them as prometheus metrics

What does that mean?

Monitoring is not a lightweight task. It requires processing all logs in the background, which significantly increases CPU usage. I'm not particularly concerned about the notification aspect since it can operate in plugin mode. The real technical challenge lies in processing all the logs in the background.

I had an idea to implement an option that allows users to select specific containers for log processing while the browser is open. This concept was discussed in detail in https://github.com/amir20/dozzle/issues/2614. It involved service workers, notifications, and a considerable amount of JavaScript. However, it appeared unreliable because browsers often terminate background tabs.

I believe the best solution is to handle this through the API, but, as mentioned, it's not straightforward.

I welcome any ideas or pull requests from anyone who wants to contribute.

tcurdt commented 5 hours ago

What does that mean?

Can you be more specific what is unclear?

Monitoring is not a lightweight task. It requires processing all logs in the background, which significantly increases CPU usage. I'm not particularly concerned about the notification aspect since it can operate in plugin mode. The real technical challenge lies in processing all the logs in the background.

We can also call it KISS alerting instead of "lightweight".

I haven't had the chance to fully dig into the code, but the idea would be to hook into the server where all the messages are received or into the client where they are sent. We just need to tap into the stream.

But maybe I am missed something how the client/server communication works in detail.

Of course will searching every line for a substring will increase the CPU usage. But I am positive enough that a strings.Contains(message, 'Error') and a prometheus counter.Inc(), even for every log line, will not be what brings our servers down. When done on the clients it would make this even distributed. It would just mean more prometheus endpoints to scrape.

If needed, this work could even be spread across CPU s(using workers) of course, but it would probably be good to profile before adding that complexity.

If fail2ban can do it, I am positive there are options.

I had an idea to implement an option that allows users to select specific containers for log processing while the browser is open. This concept was discussed in detail in https://github.com/amir20/dozzle/issues/2614. It involved service workers, notifications, and a considerable amount of JavaScript. However, it appeared unreliable because browsers often terminate background tabs.

That proposal mentions HTML5 notifications and web workers - which leaves me puzzled for the use case. In my book this feature should not require a single line of javascript. At most the browser could be used to control the needles we are searching for.

There are of course ways for a richer and better in-browser experience when going beyond (prometheus) counters. But this would still be just a list of pointers into the stream, gathered and exposed by the server.