As a server admin, i want the monitor command to not interrupt currently running backups.
Game
All ?!
Linux distro
Ubuntu 22.04
Command
command: backup
Further information
I noticed a bug when my Minecraft server got mistakenly restarted during a backup because of an automated cronjob running the monitor command (while the server was creating a backup) but essentially at the same time. This occured becasuse i have the monitor command set to run every 5 minutes and the backup command to run every day at 4:00 am.
After looking into what's happening i think i found the issue:
If you start a backup and immediatley run the monitor command after the server has already been stopped by the backup command but before the backup.lock file was created, the monitor command will pass the backup.lock filecheck and run regularly (start querying the gameserver) and restart the server because the backup.lock file didn't exist. The monitor command will send out the appropriate alert Unable to query mcserver. Game server has been restarted. and restart the server. This will possibly result in the backup not finishing becasue of a tar error: Backing up mcserver: Backup (9,7G) mcserver-2024-02-12-015407.tar.gz, in progress...tar: ././serverfiles/world: file changed as we read it FAIL This will also lead to the backup.lock file not being deleted.
Note that this will not be noticed if the backup finishes within 60 seconds of the monitor command being called, in which case the monitor command will simply assume that the server is online and thus won't restart the server.
If creating the backup takes a little longer the monitor command will continue querying and spew out errors depending how far the backup command got:
[ ./mcserver backup is already running ]
testgameserver@HostMachine:~/minecraft$ ./mcserver m
[ OK ] Monitoring mcserver: Checking session: OK
[ .... ] Monitoring mcserver: Querying port: gamedig: 192.168.1.60:25566 : 0/1: QUERYINGhead: cannot open '/home/testgameserver/minecraft/lgsm/lock/mcserver-started.lock' for reading: No such file or directory
/home/testgameserver/minecraft/lgsm/modules/command_monitor.sh: line 273: [: : integer expression expected
[ .... ] Monitoring mcserver: Querying port: gamedig: 127.0.0.1:25566 : 15/2: QUERYINGhead: cannot open '/home/testgameserver/minecraft/lgsm/lock/mcserver-started.lock' for reading: No such file or directory
/home/testgameserver/minecraft/lgsm/modules/command_monitor.sh: line 273: [: : integer expression expected
[ .... ] Monitoring mcserver: Querying port: gamedig: 192.168.1.60:25566 : 15/2: QUERYINGhead: cannot open '/home/testgameserver/minecraft/lgsm/lock/mcserver-started.lock' for reading: No such file or directory
/home/testgameserver/minecraft/lgsm/modules/command_monitor.sh: line 273: [: : integer expression expected
[ .... ] Monitoring mcserver: Querying port: gamedig: 127.0.0.1:25566 : 30/3: QUERYINGhead: cannot open '/home/testgameserver/minecraft/lgsm/lock/mcserver-started.lock' for reading: No such file or directory
/home/testgameserver/minecraft/lgsm/modules/command_monitor.sh: line 273: [: : integer expression expected
[ .... ] Monitoring mcserver: Querying port: gamedig: 192.168.1.60:25566 : 30/3: QUERYINGhead: cannot open '/home/testgameserver/minecraft/lgsm/lock/mcserver-started.lock' for reading: No such file or directory
/home/testgameserver/minecraft/lgsm/modules/command_monitor.sh: line 273: [: : integer expression expected
[ .... ] Monitoring mcserver: Querying port: gamedig: 127.0.0.1:25566 : 45/4: QUERYINGhead: cannot open '/home/testgameserver/minecraft/lgsm/lock/mcserver-started.lock' for reading: No such file or directory
/home/testgameserver/minecraft/lgsm/modules/command_monitor.sh: line 273: [: : integer expression expected
[ .... ] Monitoring mcserver: Querying port: gamedig: 192.168.1.60:25566 : 45/4: QUERYINGhead: cannot open '/home/testgameserver/minecraft/lgsm/lock/mcserver-started.lock' for reading: No such file or directory
/home/testgameserver/minecraft/lgsm/modules/command_monitor.sh: line 273: [: : integer expression expected
[ .... ] Monitoring mcserver: Querying port: gamedig: 127.0.0.1:25566 : 60/5: QUERYINGhead: cannot open '/home/testgameserver/minecraft/lgsm/lock/mcserver-started.lock' for reading: No such file or directory
/home/testgameserver/minecraft/lgsm/modules/command_monitor.sh: line 273: [: : integer expression expected
[ FAIL ] Monitoring mcserver: Querying port: gamedig: 127.0.0.1:25566 : 60/5: FAIL
[ OK ] Monitoring mcserver: Sending Discord alert
fetching GitHub command_restart.sh...OK
Another error that can occur depending on timing:
[ ./mcserver backup is already running ]
testgameserver@HostMachine:~/minecraft$ ./mcserver m
[ OK ] Monitoring mcserver: Checking session: OK
[ OK ] Monitoring mcserver: Querying port: gamedig: 0.0.0.0:25566 : 0/1: DELAY
head: cannot open '/home/testgameserver/minecraft/lgsm/lock/mcserver-started.lock' for reading: No such file or directory
date: invalid date ‘@’
core_exit.sh exiting with code: 0
Possible Solutions:
Create the backup.lock file earlier in the backup flow.
Re-Check the existance of lockfiles whilst querying the server in the monitor flow.
Add a gameserver-monitor.lock file that forces commands that affect uptime to wait until the lock file is removed.
Relevant log output
Feb 12 01:54:00.784 mcserver: START: PASS: Started TestServer
Feb 12 01:54:00.887 mcserver: START: PASS: Started TestServer
Feb 12 01:54:00.889 mcserver: START: INFO: Sending alert: mcserver has started
Feb 12 01:54:02.618 mcserver: START: PASS: Sending Discord alert
Feb 12 01:54:02.624 mcserver: START: INFO: LinuxGSM version: v24.1.3
Feb 12 01:54:02.625 mcserver: START: PASS: core_exit.sh exiting with code: 0
Feb 12 01:54:07.788 mcserver: BACKUP: INFO: Starting backup
Feb 12 01:54:07.890 mcserver: BACKUP: INFO: No previous backups found
Feb 12 01:54:07.995 mcserver: BACKUP: WARN: mcserver will be restarted
Feb 12 01:54:13.036 mcserver: STOP: INFO: Graceful: sending "stop"
Feb 12 01:54:14.023 mcserver: MONITOR: INFO: Checking session: CHECKING
Feb 12 01:54:14.236 mcserver: MONITOR: PASS: Checking session: OK
Feb 12 01:54:15.740 mcserver: MONITOR: INFO: Querying port: gamedig: 127.0.0.1:25566 : 1 : QUERYING
Feb 12 01:54:16.844 mcserver: MONITOR: INFO: Querying port: gamedig: 0.0.0.0:25566 : 1 : DELAY
Feb 12 01:54:16.845 mcserver: MONITOR: INFO: Query bypassed: mcserver started less than 1 minutes ago
Feb 12 01:54:16.847 mcserver: MONITOR: INFO: Server started: Mo 12. Feb 01:54:00 CET 2024
Feb 12 01:54:16.848 mcserver: MONITOR: INFO: Current time: Mo 12. Feb 01:54:16 CET 2024
Feb 12 01:54:16.851 mcserver: MONITOR: INFO: LinuxGSM version: v24.1.3
Feb 12 01:54:16.852 mcserver: MONITOR: PASS: core_exit.sh exiting with code: 0
Feb 12 01:54:19.965 mcserver: MONITOR: INFO: Checking session: CHECKING
Feb 12 01:54:20.178 mcserver: MONITOR: PASS: Checking session: OK
Feb 12 01:54:20.774 mcserver: STOP: PASS: Graceful: sending "stop": OK: 6 seconds
Feb 12 01:54:20.784 mcserver: BACKUP: INFO: Backup lockfile generated
Feb 12 01:54:20.785 mcserver: BACKUP: INFO: /home/testgameserver/minecraft/lgsm/lock/backup.lock
Feb 12 01:54:20.887 mcserver: BACKUP: INFO: A total of 9,7G will be compressed: /home/testgameserver/minecraft/lgsm/backup/mcserver-2024-02-12-015407.tar.gz
Feb 12 01:54:21.389 mcserver: BACKUP: INFO: Backup 9,7G mcserver-2024-02-12-015407.tar.gz, in progress
Feb 12 01:54:21.681 mcserver: MONITOR: INFO: Querying port: gamedig: 127.0.0.1:25566 : 1 : QUERYING
Feb 12 01:54:21.891 mcserver: MONITOR: WARN: Querying port: gamedig: 127.0.0.1:25566 : 1 : FAIL
Feb 12 01:54:23.393 mcserver: MONITOR: INFO: Querying port: gamedig: 192.168.1.60:25566 : 1 : QUERYING
Feb 12 01:54:23.603 mcserver: MONITOR: WARN: Querying port: gamedig: 192.168.1.60:25566 : 1 : FAIL
Feb 12 01:54:41.632 mcserver: MONITOR: INFO: Querying port: gamedig: 127.0.0.1:25566 : 2 : QUERYING
Feb 12 01:54:41.841 mcserver: MONITOR: WARN: Querying port: gamedig: 127.0.0.1:25566 : 2 : FAIL
Feb 12 01:54:43.344 mcserver: MONITOR: INFO: Querying port: gamedig: 192.168.1.60:25566 : 2 : QUERYING
Feb 12 01:54:43.553 mcserver: MONITOR: WARN: Querying port: gamedig: 192.168.1.60:25566 : 2 : FAIL
Feb 12 01:55:01.582 mcserver: MONITOR: INFO: Querying port: gamedig: 127.0.0.1:25566 : 3 : QUERYING
Feb 12 01:55:01.792 mcserver: MONITOR: WARN: Querying port: gamedig: 127.0.0.1:25566 : 3 : FAIL
Feb 12 01:55:03.295 mcserver: MONITOR: INFO: Querying port: gamedig: 192.168.1.60:25566 : 3 : QUERYING
Feb 12 01:55:03.505 mcserver: MONITOR: WARN: Querying port: gamedig: 192.168.1.60:25566 : 3 : FAIL
Feb 12 01:55:21.532 mcserver: MONITOR: INFO: Querying port: gamedig: 127.0.0.1:25566 : 4 : QUERYING
Feb 12 01:55:21.742 mcserver: MONITOR: WARN: Querying port: gamedig: 127.0.0.1:25566 : 4 : FAIL
Feb 12 01:55:23.245 mcserver: MONITOR: INFO: Querying port: gamedig: 192.168.1.60:25566 : 4 : QUERYING
Feb 12 01:55:23.454 mcserver: MONITOR: WARN: Querying port: gamedig: 192.168.1.60:25566 : 4 : FAIL
Feb 12 01:55:41.481 mcserver: MONITOR: INFO: Querying port: gamedig: 127.0.0.1:25566 : 5 : QUERYING
Feb 12 01:55:41.691 mcserver: MONITOR: WARN: Querying port: gamedig: 127.0.0.1:25566 : 5 : FAIL
Feb 12 01:55:41.894 mcserver: MONITOR: WARN: Querying port: gamedig: 127.0.0.1:25566 : 5 : FAIL
Feb 12 01:55:41.897 mcserver: MONITOR: INFO: Sending alert: Unable to query mcserver. Game server has been restarted
Feb 12 01:55:43.635 mcserver: MONITOR: PASS: Sending Discord alert
Feb 12 01:55:44.937 mcserver: STOP: INFO: TestServer is already stopped
Feb 12 01:55:46.589 mcserver: START: INFO: Rotating log files
Feb 12 01:55:47.710 mcserver: START: PASS: Started TestServer
Feb 12 01:55:47.812 mcserver: START: PASS: Started TestServer
Feb 12 01:55:47.820 mcserver: MONITOR: INFO: LinuxGSM version: v24.1.3
Feb 12 01:55:47.820 mcserver: MONITOR: PASS: core_exit.sh exiting with code: 0
Feb 12 01:57:25.220 mcserver: BACKUP: FATAL: Backup in progress: FAIL
Feb 12 01:57:25.323 mcserver: BACKUP: FATAL: Starting backup
Feb 12 01:57:27.148 mcserver: START: ERROR: TestServer is already running
Feb 12 01:57:27.725 mcserver: START: INFO: Rotating log files
Feb 12 01:55:47.710 mcserver: START: PASS: Started TestServer
Feb 12 01:55:47.812 mcserver: START: PASS: Started TestServer
Feb 12 01:55:47.820 mcserver: MONITOR: INFO: LinuxGSM version: v24.1.3
Feb 12 01:55:47.820 mcserver: MONITOR: PASS: core_exit.sh exiting with code: 0
Feb 12 01:57:25.220 mcserver: BACKUP: FATAL: Backup in progress: FAIL
Feb 12 01:57:25.323 mcserver: BACKUP: FATAL: Starting backup
Feb 12 01:57:27.148 mcserver: START: ERROR: TestServer is already running
Feb 12 01:57:27.725 mcserver: START: INFO: Rotating log files
Feb 12 01:57:28.842 mcserver: START: PASS: Started TestServer
Feb 12 01:57:28.945 mcserver: START: PASS: Started TestServer
Feb 12 01:57:28.953 mcserver: BACKUP: INFO: LinuxGSM version: v24.1.3
Feb 12 01:57:28.954 mcserver: BACKUP: PASS: core_exit.sh exiting with code: 0
Feb 12 01:58:04.750 mcserver: DETAILS: INFO: LinuxGSM version: v24.1.3
Feb 12 01:58:04.751 mcserver: DETAILS: PASS: core_exit.sh exiting with code: 0
Steps to reproduce
Install any gameserver
Enable stoponbackup
Start gameserver
Run ./gameserver backup
Wait until the backup command is trying to stop the gameserver
Run ./gameserver monitor
If monitor command notices a backup.lock file, you were too slow and the backup command has gotten too far already
If monitor command finds that the server is running, the backup command didn't stop the gameserver yet (quickly re-run the monitor command)
The monitor command should find that the server is not online and start to repeatedly query the gameserver for 60 seconds
If the backup takes less than 60 seconds, the monitor command will acept the gameserver to be online (again)
If the backup takes more than 60 seconds, the monitor will assume the server is offline and will restart the gameserver
Force this by adding a big file to the serverfiles directory
If the gameserver got restarted by the monitor command the backup can fail
Keep in mind to manually delete the backup.lock file in between trying to reproduce this bug becasue it won't be deleted sometimes.
User story
As a server admin, i want the monitor command to not interrupt currently running backups.
Game
All ?!
Linux distro
Ubuntu 22.04
Command
command: backup
Further information
I noticed a bug when my Minecraft server got mistakenly restarted during a backup because of an automated cronjob running the monitor command (while the server was creating a backup) but essentially at the same time. This occured becasuse i have the monitor command set to run every 5 minutes and the backup command to run every day at 4:00 am.
After looking into what's happening i think i found the issue:
If you start a backup and immediatley run the monitor command after the server has already been stopped by the backup command but before the
backup.lock
file was created, the monitor command will pass thebackup.lock
filecheck and run regularly (start querying the gameserver) and restart the server because thebackup.lock
file didn't exist. The monitor command will send out the appropriate alertUnable to query mcserver. Game server has been restarted.
and restart the server. This will possibly result in the backup not finishing becasue of atar
error:Backing up mcserver: Backup (9,7G) mcserver-2024-02-12-015407.tar.gz, in progress...tar: ././serverfiles/world: file changed as we read it FAIL
This will also lead to thebackup.lock
file not being deleted.Note that this will not be noticed if the backup finishes within 60 seconds of the monitor command being called, in which case the monitor command will simply assume that the server is online and thus won't restart the server.
If creating the backup takes a little longer the monitor command will continue querying and spew out errors depending how far the backup command got:
Another error that can occur depending on timing:
Possible Solutions:
backup.lock
file earlier in the backup flow.gameserver-monitor.lock
file that forces commands that affect uptime to wait until the lock file is removed.Relevant log output
Steps to reproduce
stoponbackup
./gameserver backup
./gameserver monitor
backup.lock
file, you were too slow and the backup command has gotten too far alreadyKeep in mind to manually delete the
backup.lock
file in between trying to reproduce this bug becasue it won't be deleted sometimes.