MinecraftServerControl / mscs

Powerful command-line control for UNIX and Linux powered Minecraft servers
https://minecraftservercontrol.github.io
BSD 2-Clause "Simplified" License
485 stars 62 forks source link

Add automatic crash detection and restarts #257

Closed Roflicide closed 3 years ago

Roflicide commented 4 years ago

Automatic crash detection and restarts

; Enable the option to restart the server after a crash is detected.
; 0 - Restart after crash (default).
; 1 - Do not restart after crash.
# mscs-enable-restart-after-crash=0

This is a preliminary PR for #225 that enables the option to restart worlds after a crash is detected using the mscs-enable-restart-after-crash flag (via mscs.defaults). I set the default to be 0 (true), but we can change the default to be 1 (false) if you think that's better. Additionally, we could add the functionality for it to be available on a per world basis via mscs.properties if you think that's a good idea as well.

How it works:

  1. When a world is started, the startServerMonitor() function is called. This checks if the mscs-enable-restart-after-crash flag is enabled. If so, it checks if there is another instance of the server monitor already running (which would occur if the server was restarted after a crash). To do this check, it uses flock. I tried for several hours to use the createLockFile() function, but it didn't work for me for whatever reason--not sure if it was because I was doing something wrong or if it was just an issue with the function itself. @sandain If you're able to get the createLockFile() function working instead of using flock, that'd be good so we don't have to use flock as a dependency.

  2. If the monitor is not running, the startServerMonitor() function essentially calls the nohup command on the serverMonitor() function and runs it in the background, so you can close the terminal and the crash monitor will keep running. nohup doesn't allow you to pass functions, but I managed to recreate the functionality through some research. The PID of the monitor is stored in $WORLD_DIR/monitor.pid.

  3. The serverMonitor() function is essentially a loop that detects if the server has crashed by a combination of the a) the existence of the $WORLD.pid file and b) whether the server is running via the serverRunning() function. If the PID file exists but the server isn't running, then this means the server probably crashed (since MSCS hasn't removed the PID file). Additionally, I had to modify the serverRunning() function to use echo instead of return to return values so I could call the function in the test condition of the until loop and check the return value using $() (I didn't think I could do this the way the function was, but maybe I'm wrong). I subsequently modified all of the references in the script to call serverRunning using $(). There is also a log for the monitor written to $WORLD_DIR/logs/mscs.monitor.log. This log is deleted after $LOG_DURATION to make sure it doesn't get too large. This check occurs in the startServerMonitor() function. There is also a log created called last-start-status.log that stores a message to be displayed later in the mscs status command.

  4. When the mscs stop command is called, it sends a request to kill the server monitor using the stopServerMonitor() function. At this point the monitor should be killed.

When a user runs the mscs status for the first time after a server has crashed, they will see a notice that the world has crashed:

$ mscs status
Minecraft Server Status:
  alex: running version 1.16.2 (0 of 20 users online).
    Port: 25567.
    Memory used: 1.04GB (2GB allocated).
    Server PID: 3351.
    Crash Monitor PID: 19569
    alex automatically restarted from a crash (or in-game stop command)
    on 2020-08-15_22-49-13. See
    /opt/mscs/worlds/alex/logs/mscs.monitor.log and
    /opt/mscs/worlds/alex/crash_reports/ and
    /opt/mscs/worlds/alex/logs/ for more information.

This notice goes away after the first time. The code behind this is simply a check if whether or not the last-start-status.log file exists and is not empty. If this is true, the message is displayed and the file is deleted afterwards (so the message is only displayed once).

P.S.: I tested crashes on the code by doing the /stop command in-game, I believe it should replicate what happens with an actual crash.

Please let me know of any questions or recommendations you have or bugs you find. Thanks!!

Summary of TODO:

zanix commented 4 years ago

0 should be disabled and 1 should be enabled

sandain commented 3 years ago

I agree with @zanix, 0 should be disabled and 1 enabled. It would be confusing otherwise. A per world option in addition to the global option would be best. Default should be disabled I think.

Sorry about the slow review. I've been busy.

Roflicide commented 3 years ago

I'm going to close this PR and re-open one that is up-to-date with the current code.