Open Jajcus opened 2 years ago
To make this manageable we have limited journal file size, so ...
So the journal files will be rotated on your system, right?
I'm not about the current issue you have (I must still understand or else reproduce it to follow up that)...
I'm rather about another issue #3396, which shows that systemd backend seems to be not really suitable for fail2ban usage (at least together with rotation) at the moment...
So how it works on your side at all?.. Do you have some newer version of python's systemd module?
(I got it reproduced (no entries read after journal rotation) with 234-3+b4
and 234-2+b1
).
I just ran across this; I added another nixos-container, and that seems to have caused fail2ban to start crashing.
lsof
reports that the systemd backend is opening each container's logfiles -- there were ~1500 file descriptors open.
I applied this patch that enables the poll2
implementation:
diff --git a/fail2ban/server/asyncserver.py b/fail2ban/server/asyncserver.py
index 0c36d846..79c7cbe3 100644
--- a/fail2ban/server/asyncserver.py
+++ b/fail2ban/server/asyncserver.py
@@ -244,7 +244,7 @@ class AsyncServer(asyncore.dispatcher):
# @param sock: socket file.
# @param force: remove the socket file if exists.
- def start(self, sock, force, timeout=None, use_poll=False):
+ def start(self, sock, force, timeout=None, use_poll=True):
self.__worker = threading.current_thread()
self.__sock = sock
# Remove socket
... and started it. I'll update this issue with results...
Don't understand how it should help, if the real error is clearly too many open file descriptors by systemd journal monitoring. The error by poll in asyncserver loop is surely an after effect.
To solve the initial issue either one should restrict journal files/paths by systemd backend or to increase nofile limits.
Alternatively one could use rsyslog (parallel to systemd journal) and switch jail(s) backend to auto
to monitor log-files instead of journal.
From the select(2)
man page:
DESCRIPTION WARNING: select() can monitor only file descriptors numbers that are less than FD_SETSIZE (1024)—an unreasonably low limit for many modern applications—and this limitation will not change. All modern applications should instead use poll(2) or epoll(7), which do not suffer this limitation.
... continuing to use the select()
backend will cap the number of logfiles that can be monitored to 1024
Some of the other stuff I tried:
The systemd unit has LimitNOFILE = 65536
.
My initial attempts to restrict the systemd backend were ineffective; the backend still opened all of the container systemd logfiles too. Doing a lsof
on the fail2ban showed that it had ~1500 file descriptors open.
I might have made a few mistakes; I'll revisit by adding something like backend = systemd[journalfiles="/var/log/journal/*.something/system.journal"]
to the jail configs
Environment:
Custom jail configuration monitoring very active systemd journal. LimitNOFILE=10240 set for fail2ban.service
The issue:
Our system logs a lot to systemd journal. To make this manageable we have limited journal file size, so there are a lot of 'small' (over 200MB0 journal files. fail2ban opens all of them, as many times as there are configured jails using journal. This is not yet the problem, but limitation of libsystemd (no way to reliably open only the current journal file), as I understand.
Without adjusting LimitNOFILE fail2ban would crash due to too many files open, as fail2ban opens more than 1024 files. Increasing the limit (to 10240) should fix the problem, but instead fail2ban crashes with:
This is because fail2ban code forces asyncore module to use the outdated select() call for watching file description. This won't work on Linux for anything more than 1024 open files.
Browsing the code and commit history suggests that 'use_poll' setting for asyncore was considered, but disables for modern Python versions for some reason. I guess it needs to be reconsidered. select() is outdated.
Steps to reproduce
Have more than 1024 systemd journal files and a fail2ban jail set to use journal. Or over 512 files and two such jails configured.
Expected behavior
Everything works provided the file limit for fail2ban-server process is set high enough.
Observed behavior
fail2ban crashes with:
ValueError: filedescriptor out of range in select()