Weird disk outages around end of month

Charcoal-SE / halflife

Metasmoke back-end analysis client

Apache License 2.0

6 stars 2 forks source link

Weird disk outages around end of month #15

Open tripleee opened 1 year ago

tripleee commented 1 year ago

A month ago I had to restart Halflife a number of times after the month had rolled over, and now I'm seeing the same thing again.

In brief, it seems to eat up all the disk space, and require a number of restarts before the space is properly reclaimed.

This could be a weird artifact of the Docker deployment model and/or how it works on the EC2 instance where I'm running this. Probably the deployment model should be reworked altogether.

tripleee commented 1 year ago

I'm guessing the number of attacks has increased, causing the log directories (including logs inside Docker) to grow significantly. Right now it runs out of disk space every few hours.

Created a .forward to improve visibility for errors; it tries to send mail when it runs out of disk space.

Consider installing fail2ban to hopefully cut down on the attempts to log in over ssh and the various prods against the websocket.

tripleee commented 1 year ago

In the interim, compressing large logs. Probably review and remove in a few weeks.

/var/log/audit/audit.log.[1-4].xz
/var/log/*-2021*.xz
/var/log/20221[01]*.xz

tripleee commented 1 year ago

Installed fail2ban in accordance with https://s3bubble.com/installing-fail2ban-on-ec2-ami-instance/ (horrible English and formatting errors but probably able to figure out what it's trying to say; ignoring the nmap stuff).

Added IP addresses to the ignoreip list in /etc/fail2ban/jail.local from the output of last -iad though it's only me and Double Beep who logged in recently.

tripleee commented 1 year ago

As a stopgap measure, installed a cron job to run ./restart every odd hour. The restart seems to free up enough disk space to continue running for a couple of hours, by quick inspection of recent manual restarts. Hopefully I could scale this down once the situation stabilizes again.

tripleee commented 1 year ago

Aggressively banning new attackers for the time being.

#!/bin/bash

ip=$(head -n 1 banned)

if ! [[ "$ip" ]]; then
    echo "$0: banned is empty -- aborting" >&2
    exit 12
fi

lastb -iadF |
awk -v latest="$ip" '$NF == latest { exit }
    { print $NF }' |
tee >(head -n 1 >banned) |
xargs -n 1 fail2ban-client set ssh-iptables banip