Closed isAAAc closed 4 years ago
Hi, thank you to open this one!
I think the problem comes from:
Found in logs:
2019-04-01 10:10:22,956: WARNING - Job for fail2ban.service failed.
2019-04-01 10:10:22,956: DEBUG - + local exit_code=1
2019-04-01 10:10:22,956: WARNING - See "systemctl status fail2ban.service" and "journalctl -xe" for details.
Can you give us the result of this command please?
sudo journalctl -u fail2ban -n10
In any case, I think a quick fix on our side would be to replace this:
systemctl reload fail2ban
by this:
systemctl reload-or-restart fail2ban
Hi @kay0u ;) here are lthe results of requested commands :
journalctl -u fail2ban -n10
c# journalctl -u fail2ban -n10
-- Logs begin at Mon 2019-04-01 10:23:55 CEST, end at Mon 2019-04-01 11:36:55 CEST. --
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: State 'stop-sigterm' timed out. Killing.
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: Killing process 13174 (fail2ban-server) with signal SIGKILL.
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: Main process exited, code=killed, status=9/KILL
avril 01 10:41:40 krashboyz systemd[1]: Stopped Fail2Ban Service.
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: Unit entered failed state.
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: Failed with result 'timeout'.
avril 01 10:41:40 krashboyz systemd[1]: Starting Fail2Ban Service...
avril 01 10:41:40 krashboyz fail2ban-client[24549]: 2019-04-01 10:41:40,993 fail2ban.server [24551]: INFO Starting Fail2ban v0.9.6
avril 01 10:41:40 krashboyz fail2ban-client[24549]: 2019-04-01 10:41:40,995 fail2ban.server [24551]: INFO Starting in daemon mode
avril 01 10:41:42 krashboyz systemd[1]: Started Fail2Ban Service.
the app rolled-back on the previous version and is availlable,
where do you think i should replace systemctl reload fail2ban
by systemctl reload-or-restart fail2ban
?
the Failed with result 'timeout'.
comes perhaps if my banned list is to large ?
for information , fail2ban is running without any action on my side :
# service fail2ban status
● fail2ban.service - Fail2Ban Service
Loaded: loaded (/lib/systemd/system/fail2ban.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2019-04-01 10:41:42 CEST; 1h 1min ago
Docs: man:fail2ban(1)
Process: 21513 ExecStop=/usr/bin/fail2ban-client stop (code=exited, status=255)
Process: 24549 ExecStart=/usr/bin/fail2ban-client -x start (code=exited, status=0/SUCCESS)
Main PID: 24553 (fail2ban-server)
Tasks: 27 (limit: 4915)
Memory: 100.5M
CPU: 19min 9.426s
CGroup: /system.slice/fail2ban.service
└─24553 /usr/bin/python3 /usr/bin/fail2ban-server -s /var/run/fail2ban/fail2ban.sock -p /var/run/fail2ban/fail2ban.pid -x -b
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: Main process exited, code=killed, status=9/KILL
avril 01 10:41:40 krashboyz systemd[1]: Stopped Fail2Ban Service.
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: Unit entered failed state.
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: Failed with result 'timeout'.
avril 01 10:41:40 krashboyz systemd[1]: Starting Fail2Ban Service...
avril 01 10:41:40 krashboyz fail2ban-client[24549]: 2019-04-01 10:41:40,993 fail2ban.server [24551]: INFO Starting Fail2ban v0.9.6
avril 01 10:41:40 krashboyz fail2ban-client[24549]: 2019-04-01 10:41:40,995 fail2ban.server [24551]: INFO Starting in daemon mode
avril 01 10:41:42 krashboyz systemd[1]: Started Fail2Ban Service.
where do you think i should replace
systemctl reload fail2ban
bysystemctl reload-or-restart fail2ban
?
Nowhere, it was on our side :-), at this place: https://github.com/YunoHost-Apps/etherpad_mypads_ynh/blob/c895bdfef14d5f3252e3989d48ab7b8c3496ed22/scripts/_common.sh#L77
the
Failed with result 'timeout'.
comes perhaps if my banned list is to large ?
I don't know, but if it is, we should handle this case anyway
yep our != your, it's not my day ^^
Nowhere, it was on our side :-), at this place:
https://github.com/YunoHost/yunohost/blob/stretch-testing/data/helpers.d/backend#L421
Already fixed on the incoming testing. And even more globally for ynh_systemd_action, https://github.com/YunoHost/yunohost/blob/stretch-testing/data/helpers.d/system#L112-L113
Anyway, I don't think the problem was about the reload itself, but more probably because of this timeout that has killed the service.
Also, this is not the log of the crash ! @isAAAc your log state that the crash happened at 10:31 this morning
2019-04-01 10:31:33,187: WARNING - Job for fail2ban.service failed.
Fail2ban's log is about a crash at 10:41
avril 01 10:41:40 krashboyz systemd[1]: fail2ban.service: State 'stop-sigterm' timed out. Killing.
humm,
@maniackcrudelis , do you want i reproduce the whole upgrade trouble ?
yunohost tools update && yunohost tools upgrade && yunohost app upgrade && service fail2ban status
?
No, just remove -n10
in journalctl -u fail2ban -n10
and scroll until 10:31 this morning.
Maybe that's going to be the same error, but that's would be interesting to know what happened exactly at this time.
damn :/
root@krashboyz:/home/isaaac# journalctl -u fail2ban
-- No entries --
/var/log/fail2ban.log (extract) is here : https://krashboyz.org/zerobin/?7870be00a8e608ec#HpNFrfGQCtBDOE6TPxiAbjZh8mvZYy8U9eT6Z1h59fM=
i think fail2ban was reloading and fetching banned ip , many IPs so to long time for the "wait" before stating the time out status (?)
!!! I guess your ssh port is still 22, maybe you should change it to prevent so much bots being banned by your fail2ban. Anyway, your fail2ban was indeed quite busy, but I see also errors, not related to etherpad I think.
First thing would be to retry to update, if it was just because of fail2ban being busy, it could work this time.
At the same time, you could tail -f
fail2ban log in another terminal, so you'll see if something happen.
when Info: [################....] > Reconfigure fail2ban
during the upgrade,
the tail -f /var/log/fail2ban.log is stil indicating the unban action :
2019-04-01 13:31:29,895 fail2ban.actions [24553]: NOTICE [sshd] Unban 104.236.78.228
2019-04-01 13:31:31,646 fail2ban.actions [24553]: NOTICE [sshd] Unban 104.237.230.211
2019-04-01 13:31:32,944 fail2ban.actions [24553]: NOTICE [sshd] Unban 104.238.92.100
2019-04-01 13:31:34,124 fail2ban.actions [24553]: NOTICE [sshd] Unban 104.239.173.150
2019-04-01 13:31:34,905 fail2ban.actions [24553]: NOTICE [sshd] Unban 104.248.11.46
perhaps we should flush all the banned ip before restarting the fail2ban / before the whole upgrade ?
Probably the unban of all IP is a internal process of fail2ban before stopping. Did it failed the upgrade ?
Did it failed the upgrade ?
yep
Could you provide the full log, captured with tail -f ?
yes sure, it is still running, i'll send it asap (going to eat for now)
the full log of the tail -f /var/log/fail2ban.org : https://krashboyz.org/zerobin/?ee4f8ff93bc7a403#MLFPGGEw39+N1A0H5Ksp8Ine/xc9tZl4V4fx7y/E+qY=
the output of the cli upgrade : https://krashboyz.org/zerobin/?51d32205b1964b61#EK2BvpnSYSdVmhn6eSffosL6c9RXXHEXL5lyQh4mNZ4=
the log of the first fail during this operation : https://paste.yunohost.org/raw/egocohuven
the second one : https://paste.yunohost.org/raw/oyegovirap
the service fail2ban status (right now)
root@krashboyz:/home/isaaac# service fail2ban status
● fail2ban.service - Fail2Ban Service
Loaded: loaded (/lib/systemd/system/fail2ban.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2019-04-01 14:01:59 CEST; 23min ago
Docs: man:fail2ban(1)
Process: 4247 ExecStop=/usr/bin/fail2ban-client stop (code=exited, status=255)
Process: 7543 ExecStart=/usr/bin/fail2ban-client -x start (code=exited, status=0/SUCCESS)
Main PID: 7550 (fail2ban-server)
Tasks: 27 (limit: 4915)
Memory: 81.9M
CPU: 9min 18.936s
CGroup: /system.slice/fail2ban.service
└─7550 /usr/bin/python3 /usr/bin/fail2ban-server -s /var/run/fail2ban/fail2ban.sock -p /var/run/fail2ban/fail2ban.pid -x -b
avril 01 14:01:57 krashboyz systemd[1]: fail2ban.service: Main process exited, code=killed, status=9/KILL
avril 01 14:01:57 krashboyz systemd[1]: Stopped Fail2Ban Service.
avril 01 14:01:57 krashboyz systemd[1]: fail2ban.service: Unit entered failed state.
avril 01 14:01:57 krashboyz systemd[1]: fail2ban.service: Failed with result 'timeout'.
avril 01 14:01:57 krashboyz systemd[1]: Starting Fail2Ban Service...
avril 01 14:01:58 krashboyz fail2ban-client[7543]: 2019-04-01 14:01:58,077 fail2ban.server [7548]: INFO Starting Fail2ban v0.9.6
avril 01 14:01:58 krashboyz fail2ban-client[7543]: 2019-04-01 14:01:58,078 fail2ban.server [7548]: INFO Starting in daemon mode
avril 01 14:01:59 krashboyz systemd[1]: Started Fail2Ban Service.
do you want i remove all banned ip by fail2ban, relaunch fail2ban and start the upgrade again ?
Yes please try that. I suspect that the reload is too long to execute for you. It took 4s only to unban before reloading.
Maybe for this specific service, in this helper, we should stop , then start to be sure it have all the time it needs.
ok, it worked,
# service fail2ban stop
# cd /var/lib/fail2ban
# sqlite3 fail2ban.sqlite3
sqlite> DELETE FROM bans ;
sqlite> .quit
# service fail2ban start
then i used : yunohost tools update && yunohost tools upgrade && yunohost app upgrade
upgrade is OK
perhaps we should flush the fail2ban.sqlite3
as first instruction during the upgrade process ?
I'd rather prefer to let time to fail2ban to do its job. An app upgrade shouldn't remove banned IP from fail2ban.
Anyway, thanks or this bug, we now know that it could happen with fail2ban.
Hi, etherpad_mypads_ynh is in fail when trying to upgrade, there is two logs : https://paste.yunohost.org/raw/rucoqajeji https://paste.yunohost.org/raw/dihilivuxa
i don't understand what's happen,
Feel free to ask for more details, thx for your help