arkmanager / ark-server-tools

Set of server tools used to manage ARK: Survival Evolved servers on Linux
MIT License
680 stars 144 forks source link

process defunct #800

Closed PatrickBostwick closed 7 years ago

PatrickBostwick commented 7 years ago

Hi, I'm noticing after an update is applied from a cronjob that sometimes I get a defunct process and the cron job email doesn't go out.

#--CRONTAB BEGIN--
#check for updates every hour
0 0-3,5-23 * * * /home/steam/bin/arkmanager update --update-mods --warn --saveworld --backup --safe
#restart server everyday at 4am because Ark
0 4 * * * /home/steam/bin/arkmanager restart --warn --saveworld
#upgrade arkmanager tools once a day at 4:20am
20 4 * * * /home/steam/bin/arkmanager upgrade-tools
#backup once a day an hour before the daily restart
0 3 * * * /home/steam/bin/arkmanager backup
#--CRONTAB END--
[steam@Pegasus ~]$ ps x
  PID TTY      STAT   TIME COMMAND
23510 ?        S      0:00 sshd: steam@pts/2
23511 pts/2    Ss     0:00 -bash
27538 ?        Zs     0:00 [arkmanager] <defunct>
27547 ?        S      0:00 /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t -f steam
27557 ?        S      0:00 /usr/sbin/postdrop -r
28878 ?        S      0:01 /bin/bash /home/steam/bin/arkmanager update --update-mods --warn --saveworld --backup --safe
28880 ?        S      0:00 /bin/bash /home/steam/bin/arkmanager update --update-mods --warn --saveworld --backup --safe
28895 ?        Rl    15:47 /home/steam/ARK/ShooterGame/Binaries/Linux/ShooterGameServer TheIsland?RCONEnabled=True?RCONPort=32330?Port=7777?QueryPort=27015?ServerPassword?ServerAdminPassword=MYPASSWORDREMOVED?MaxPlayers=75?GameMod
32529 ?        S      0:00 sleep 5
32530 pts/2    R+     0:00 ps x

This doesn't go away even though the server finished the update and restarted successfully. In order to make it go away I have to restart the server.

[steam@Pegasus ~]$ arkmanager saveworld
Running command 'saveworld' for instance 'main'
"World Saved 
 "
You have new mail in /var/spool/mail/steam
[steam@Pegasus ~]$ arkmanager restart
Running command 'restart' for instance 'main'
Stopping server; reason: restart
The server has been stopped
The server is starting...
The server is now running, and should be up within 10 minutes
[steam@Pegasus ~]$ ps x
  PID TTY      STAT   TIME COMMAND
 1174 pts/2    S      0:00 /bin/bash /home/steam/bin/arkmanager restart
 1176 pts/2    S      0:00 /bin/bash /home/steam/bin/arkmanager restart
 1191 pts/2    Rl     0:03 /home/steam/ARK/ShooterGame/Binaries/Linux/ShooterGameServer TheIsland?RCONEnabled=True?RCONPort=32330?Port=7777?QueryPort=27015?ServerPassword?ServerAdminPassword=MYPASSWORDREMOVED?MaxPlayers=75?GameMod
 1199 pts/2    S      0:00 sleep 5
 1220 pts/2    R+     0:00 ps x
23510 ?        S      0:00 sshd: steam@pts/2
23511 pts/2    Ss     0:00 -bash
PatrickBostwick commented 7 years ago

Something else to add, when the update was applied there was no warning given even though I have the --warn flag enabled. See below:

CRONJOB email pasted below (it finished sending after I restarted the server)

Running command 'update' for instance 'main'
Checking for update; PID: 27545
Downloading mod 512609530 ...  ..............Mod 512609530 downloaded
Downloading mod 523235486 ...  .................Mod 523235486 downloaded
Downloading mod 554678442 ...  ......Mod 554678442 downloaded
Downloading mod 566885854 ...  .....Mod 566885854 downloaded
Downloading mod 566887000 ...  ......Mod 566887000 downloaded
Downloading mod 569786012 ...  .....Mod 569786012 downloaded
Downloading mod 600705968 ...  .....Mod 600705968 downloaded
Downloading mod 655261420 ...  .............Mod 655261420 downloaded
Downloading mod 680481868 ...  .....Mod 680481868 downloaded
Downloading mod 693416678 ...  .....Mod 693416678 downloaded
Downloading mod 703724165 ...  .....Mod 703724165 downloaded
Downloading mod 725398419 ...  ......Mod 725398419 downloaded
Downloading mod 730794403 ...  ......Mod 730794403 downloaded
Downloading mod 731604991 ...  ......Mod 731604991 downloaded
Downloading mod 793605978 ...  .....Mod 793605978 downloaded
Downloading mod 821530042 ...  ......Mod 821530042 downloaded
Downloading mod 845855498 ...  .....Mod 845855498 downloaded
Save file older than 1 minute. Delaying update.
Save file older than 1 minute. Delaying update.
Save file older than 1 minute. Delaying update.
Save file older than 1 minute. Delaying update.
Save file older than 1 minute. Delaying update.
Save file older than 1 minute. Delaying update.
Save file older than 1 minute. Delaying update.
Save file newer than 1 minute. Performing an update.
Saving world
"World Saved
 "
Stopping server; reason: update
The server has been stopped
\033[0;39m Saved arks directory is /home/steam/ARK/ShooterGame/Saved/SavedArks
 Copying ARK world file (TheIsland) ^[[68G[   OK   ]
 Copying ARK profile files
   76561197993024490.arkprofile ^[[68G[   OK   ]
   76561198004746997.arkprofile ^[[68G[   OK   ]
   76561198024927235.arkprofile ^[[68G[   OK   ]
   76561198042635162.arkprofile ^[[68G[   OK   ]
   76561198064898592.arkprofile ^[[68G[   OK   ]
 Copying ARK tribe files
   1355459685.arktribe ^[[68G[   OK   ]
 Copying ARK tribute tribe files
 Copying GameUserSettings.ini ^[[68G[   OK   ]
 Copying Game.ini ^[[68G[   OK   ]
 Compressing Backup ^[[68G[   OK   ]
 Created Backup:  2017-06-12_13.04.50.tar.bz2
Performing ARK update ...  ..................................................................................................Update to 1884206 complete
The server is starting...
The server is now running, and should be up within 10 minutes

Below is a paste from the arkmanager.log around that time and a bit before for context:

2017-06-12 04:04:06: [main] The server is starting...
2017-06-12 04:04:06: [main] The server is now running, and should be up within 10 minutes
2017-06-12 05:00:01: [main] Checking for update; PID: 20670
2017-06-12 05:01:11: [main] Your server is already up to date! The most recent version is 1882062.
2017-06-12 06:00:01: [main] Checking for update; PID: 25456
2017-06-12 06:01:05: [main] Your server is already up to date! The most recent version is 1882062.
2017-06-12 07:00:01: [main] Checking for update; PID: 30203
2017-06-12 07:01:17: [main] Your server is already up to date! The most recent version is 1882062.
2017-06-12 08:00:01: [main] Checking for update; PID: 2545
2017-06-12 08:02:10: [main] Your server is already up to date! The most recent version is 1882062.
2017-06-12 09:00:01: [main] Checking for update; PID: 7426
2017-06-12 09:01:24: [main] Your server is already up to date! The most recent version is 1882062.
2017-06-12 10:00:01: [main] Checking for update; PID: 12456
2017-06-12 10:01:07: [main] Your server is already up to date! The most recent version is 1882062.
2017-06-12 11:00:01: [main] Checking for update; PID: 17500
2017-06-12 11:02:05: [main] Your server is already up to date! The most recent version is 1882062.
2017-06-12 12:00:01: [main] Checking for update; PID: 22625
2017-06-12 12:01:03: [main] Your server is already up to date! The most recent version is 1882062.
2017-06-12 13:00:01: [main] Checking for update; PID: 27545
2017-06-12 13:01:16: [main] Save file older than 1 minute. Delaying update.
2017-06-12 13:01:46: [main] Save file older than 1 minute. Delaying update.
2017-06-12 13:02:16: [main] Save file older than 1 minute. Delaying update.
2017-06-12 13:02:46: [main] Save file older than 1 minute. Delaying update.
2017-06-12 13:03:16: [main] Save file older than 1 minute. Delaying update.
2017-06-12 13:03:46: [main] Save file older than 1 minute. Delaying update.
2017-06-12 13:04:16: [main] Save file older than 1 minute. Delaying update.
2017-06-12 13:04:46: [main] Save file newer than 1 minute. Performing an update.
2017-06-12 13:04:46: [main] Saving world
2017-06-12 13:04:48: [main] Stopping server; reason: update
2017-06-12 13:04:50: [main] The server has been stopped
2017-06-12 13:04:54: [main] Performing ARK update
2017-06-12 13:05:43: [main] Update to 1884206 complete
2017-06-12 13:05:43: [main] The server is starting...
2017-06-12 13:05:43: [main] The server is now running, and should be up within 10 minutes
2017-06-12 14:00:01: [main] Checking for update; PID: 32683
2017-06-12 14:01:07: [main] Your server is already up to date! The most recent version is 1884206.
2017-06-12 14:01:44: [main] Stopping server; reason: restart
2017-06-12 14:01:46: [main] The server has been stopped
2017-06-12 14:01:46: stop
2017-06-12 14:01:47: restart
2017-06-12 14:01:47: [main] The server is starting...
2017-06-12 14:01:47: [main] The server is now running, and should be up within 10 minutes

Below is a snippet from the arkserver.log as well:

2017-06-11 21:06:58: [main] 2017-06-11 21:06:58: Server PID: 15784
2017-06-11 21:06:59: [main] [S_API FAIL] SteamAPI_Init() failed; SteamAPI_IsSteamRunning() failed.
2017-06-11 21:06:59: [main] Setting breakpad minidump AppID = 346110
2017-06-11 21:08:17: [main] 2017-06-11 21:08:17: server is up
2017-06-12 04:04:04: [main] Using binned.
2017-06-12 04:04:04: [main] 4.5.1-0+UE4 7038 3077 402 7
2017-06-12 04:04:04: [main] CtrlCHandler: Signal=2
2017-06-12 04:04:05: [main] /home/steam/bin/arkmanager: line 1028: 15784 Aborted                 "$arkserverroot/$arkserverexec" "$arkserveropts" "${arkextraopts[@]}"
2017-06-12 04:04:06: [main] 2017-06-12 04:04:06: start
2017-06-12 04:04:06: [main] 2017-06-12 04:04:06: Running /home/steam/ARK/ShooterGame/Binaries/Linux/ShooterGameServer TheIsland\?RCONEnabled=True\?RCONPort=32330\?Port=7777\?QueryPort=27015\?ServerPassword\?ServerAdminPassword=MYPASSWORDREMOVED\?MaxPlayers=75\?GameModIds=512609530\,845855498\,731604991\,680481868\,566885854\,566887000\,655261420\,703724165\,793605978\,554678442\,821530042\,693416678\,600705968\,523235486\,725398419\,569786012\,730794403\?listen
2017-06-12 04:04:06: [main] 2017-06-12 04:04:06: Server PID: 16848
2017-06-12 04:04:07: [main] [S_API FAIL] SteamAPI_Init() failed; SteamAPI_IsSteamRunning() failed.
2017-06-12 04:04:07: [main] Setting breakpad minidump AppID = 346110
2017-06-12 04:05:25: [main] 2017-06-12 04:05:25: server is up
2017-06-12 13:04:49: [main] Using binned.
2017-06-12 13:04:49: [main] 4.5.1-0+UE4 7038 3077 402 7
2017-06-12 13:04:49: [main] CtrlCHandler: Signal=2
2017-06-12 13:04:51: [main] /home/steam/bin/arkmanager: line 1028: 16848 Aborted                 "$arkserverroot/$arkserverexec" "$arkserveropts" "${arkextraopts[@]}"
2017-06-12 13:05:43: [main] 2017-06-12 13:05:43: start
2017-06-12 13:05:43: [main] 2017-06-12 13:05:43: Running /home/steam/ARK/ShooterGame/Binaries/Linux/ShooterGameServer TheIsland\?RCONEnabled=True\?RCONPort=32330\?Port=7777\?QueryPort=27015\?ServerPassword\?ServerAdminPassword=MYPASSWORDREMOVED\?MaxPlayers=75\?GameModIds=512609530\,845855498\,731604991\,680481868\,566885854\,566887000\,655261420\,703724165\,793605978\,554678442\,821530042\,693416678\,600705968\,523235486\,725398419\,569786012\,730794403\?listen
2017-06-12 13:05:43: [main] 2017-06-12 13:05:43: Server PID: 28895
2017-06-12 13:05:44: [main] [S_API FAIL] SteamAPI_Init() failed; SteamAPI_IsSteamRunning() failed.
2017-06-12 13:05:44: [main] Setting breakpad minidump AppID = 346110
2017-06-12 13:07:02: [main] 2017-06-12 13:07:02: server is up
2017-06-12 14:01:45: [main] Using binned.
2017-06-12 14:01:45: [main] 4.5.1-0+UE4 7038 3077 402 7
2017-06-12 14:01:45: [main] CtrlCHandler: Signal=2
2017-06-12 14:01:45: [main] /home/steam/bin/arkmanager: line 1028: 28895 Aborted                 "$arkserverroot/$arkserverexec" "$arkserveropts" "${arkextraopts[@]}"
2017-06-12 14:01:45: [main] 2017-06-12 14:01:45: Bad PID ''; expected '28895'
2017-06-12 14:01:45: [main] 2017-06-12 14:01:45: exited with status 0
2017-06-12 14:01:47: [main] 2017-06-12 14:01:47: start
2017-06-12 14:01:47: [main] 2017-06-12 14:01:47: Running /home/steam/ARK/ShooterGame/Binaries/Linux/ShooterGameServer TheIsland\?RCONEnabled=True\?RCONPort=32330\?Port=7777\?QueryPort=27015\?ServerPassword\?ServerAdminPassword=MYPASSWORDREMOVED\?MaxPlayers=75\?GameModIds=512609530\,845855498\,731604991\,680481868\,566885854\,566887000\,655261420\,703724165\,793605978\,554678442\,821530042\,693416678\,600705968\,523235486\,725398419\,569786012\,730794403\?listen
2017-06-12 14:01:47: [main] 2017-06-12 14:01:47: Server PID: 1191
2017-06-12 14:01:48: [main] [S_API FAIL] SteamAPI_Init() failed; SteamAPI_IsSteamRunning() failed.
2017-06-12 14:01:48: [main] Setting breakpad minidump AppID = 346110
2017-06-12 14:03:06: [main] 2017-06-12 14:03:06: server is up
klightspeed commented 7 years ago

There can only be one update condition: --warn means the players will be warned, and the update will progress when either the time expires or everyone disconnects --safe means the update will wait until the server saves the world. --ifempty means the update will progress if nobody is connected to the server.

As for the zombie process, and the email not being sent, it looks like the sendmail process hadn't finished, and the cron daemon hadn't reaped its child.

klightspeed commented 7 years ago

Note that --safe is redundant when --saveworld is used.

PatrickBostwick commented 7 years ago

So it would be acceptable then to use --warn and --saveworld together?

klightspeed commented 7 years ago

Yes. This will warn the players, and when either the time expires or everyone logs off, then it will explicitly save the world before restarting the server.

PatrickBostwick commented 7 years ago

Excellent, thank you so much for your help! I'll do a bit of troubleshooting over the next few days to see what is causing my postfix process to hang and report back.

PatrickBostwick commented 7 years ago

For troubleshooting purposes and a lack of errors in my error logs I decided to change my email server from postfix to exim. The problem still occurs but now I don't have any hanging postfix processes.

[steam@Pegasus ~]$ ps x
  PID TTY      STAT   TIME COMMAND
 9517 ?        Zs     0:00 [arkmanager] <defunct>
10528 ?        S      0:00 /bin/bash /home/steam/bin/arkmanager update --update-mods --warn --saveworld --backup
10530 ?        S      0:00 /bin/bash /home/steam/bin/arkmanager update --update-mods --warn --saveworld --backup
10545 ?        Sl     3:59 /home/steam/ARK/ShooterGame/Binaries/Linux/ShooterGameServer TheIsland?RCONEnabled=True?RCONPort=32330?Port=7777?QueryPort=27015?ServerPassword?ServerAdminPassword=PASSWORDREMOVED?MaxPlayers=75?GameMod
12285 ?        S      0:00 sshd: steam@pts/0
12286 pts/0    Ss     0:00 -bash
12506 ?        S      0:00 sleep 5
12507 pts/0    R+     0:00 ps x

Do you have any tips for troubleshooting this further? My crontab emails are coming through now despite the hanging process.

klightspeed commented 7 years ago

If you run ps -AHF, what is the parent process of that zombie process?

PatrickBostwick commented 7 years ago

This is a snippet of the output:

root       778     1  0 31556  1604   3 Jun13 ?        00:00:00   /usr/sbin/crond -n
root     18287   778  0 45508  6288   7 02:00 ?        00:00:00     /usr/sbin/CROND -n
steam    18289 18287  0     0     0  13 02:00 ?        00:00:00       [arkmanager] <defunct>
exim     18298 18287  0 25905  4332   9 02:00 ?        00:00:00       /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t -f steam

I guess it is still trying to send the email though in this case the process appears to be hidden in ps x but not in ps -AHF. Are there any other troubleshooting steps I can take from here?

EDIT: here is the 'ps x' output for the above 'ps -AHF' output.

  PID TTY      STAT   TIME COMMAND
10624 ?        S      0:00 sshd: steam@pts/0
10625 pts/0    Ss     0:00 -bash
18289 ?        Zs     0:00 [arkmanager] <defunct>
21624 ?        S      0:00 /bin/bash /home/steam/bin/arkmanager update --update-mods --warn --saveworld --backup
21626 ?        S      0:00 /bin/bash /home/steam/bin/arkmanager update --update-mods --warn --saveworld --backup
21641 ?        Sl    11:01 /home/steam/ARK/ShooterGame/Binaries/Linux/ShooterGameServer TheIsland?RCONEnabled=True?RCONPort=32330?Port=7777?QueryPort=27015?ServerPassword?ServerAdminPassword=REMOVEDPASSWORD?MaxPlayers=75?GameModIds=512609530,845855498,731604991,680481868,566885854,566887000,655261420,703724165,793605978,554678442,821530042,693416678,600705968,523235486,725398419,569786012,730794403?listen
26095 ?        S      0:00 sleep 5
26096 pts/0    R+     0:00 ps x
klightspeed commented 7 years ago

ps x only shows processes running as the user executing the command. /usr/sbin/sendmail is running as the exim user.

It does look like the crond job process isn't waiting on any child processes, and it isn't closing the pipe connected to sendmail to signal an EOF (perhaps writing a lone dot to end the email, yet using the -i option to /usr/sbin/sendmail to ignore the lone dot).

A workaround is in the second-last post of this thread: https://forums.gentoo.org/viewtopic-t-480390-start-0.html

PatrickBostwick commented 7 years ago

I've been playing around with various ways to pipe output to email programs and during my testing I've found that only the server restart command hangs. It does this even on command line when you pipe it to an email program. Please try the following command and let me know if it hangs for you as well? The cat -v is simply to make sure that the output is registered as text and not application/octet-stream.

arkmanager restart --warn --saveworld | cat -v | mailx -s 'Daily restart' steam

klightspeed commented 7 years ago

23a58e9 should fix this - the subprocess writing to the log still had stdout open.

PatrickBostwick commented 7 years ago

@klightspeed Thank you so much for all of your help! The patch seems to have fixed the problem.