SickHub / ark-server-charts

A helm chart for an ARK Survival Evolved Cluster
GNU General Public License v3.0
13 stars 4 forks source link

Liveliness probe failed 3 out of 9 maps. #26

Closed Dracozny closed 1 year ago

Dracozny commented 1 year ago

Ok, went for almost broke and I currently have 6 servers running. Anyways 2 of them are giving a liveliness probe failure very frequently. and the logs output:

2023-01-31 06:11:12: Server PID: 1798
[S_API FAIL] SteamAPI_Init() failed; SteamAPI_IsSteamRunning() failed.
Setting breakpad minidump AppID = 346110
2023-01-31 06:13:23: server is up
/usr/local/bin/arkmanager: line 1314:  1798 Killed                  "$arkserverroot/$arkserverexec" "$arkserveropts" "${arkextraopts[@]}"
2023-01-31 06:14:18: Bad PID ''; expected '1798'
2023-01-31 06:14:18: exited with status 0
2023-01-31 06:14:18: restarting server

I've tried scaling it to zero and back but it persists on crystalisles and lostisland.

I have tried to manually probe them and I even went out of my way to install netstat which shows me it's listening on all the correct ports but testing from pod to pod I get no response other than ping.

the ragnarok server is also doing this but not at such and extreme rate. or at least it's not reporting the issue at the same rate despite the liveliness probes being the same at the default rate.

I'm not sure where else I should be looking for a cause.

DrPsychick commented 1 year ago

The probes use the RCON port. You could try patching your deployment to see if other ports work better (UDP or query port).

As the ARK server itself is a blackbox, the only other option would be using some kind of script for the probes that could check if the process is still running or so.

Dracozny commented 1 year ago

yea that's basically what I was thinking. possibly just watching the logs. In theory the "Server Ready" line should be a reasonable trigger from what I have seen. Or is that message actually from the probe?

On Tue, Jan 31, 2023 at 10:25 AM DrPsychick @.***> wrote:

The probes use the RCON port. You could try patching your deployment to see if other ports work better (UDP or query port).

As the ARK server itself is a blackbox, the only other option would be using some kind of script for the probes that could check if the process is still running or so.

— Reply to this email directly, view it on GitHub https://github.com/DrPsychick/ark-server-charts/issues/26#issuecomment-1410870938, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2PC6GQTF3RDQUGFGTT6ZDWVFKLPANCNFSM6AAAAAAUMBFAIM . You are receiving this because you authored the thread.Message ID: @.***>

Dracozny commented 1 year ago

Maybe we take it back a step. The methodology still worked for 6 maps despite the initial startup for the first server needing to do downloads.
This issue is just these 3 maps will report the "Server Started" in the log but the ports won't connect at all as if the pod itself is running a firewall even though we know it doesn't. I can exec into the pod and verify things before the probe kills it and I can even run netcat -l 27016 on the pods and use netcat on another to send messages but these 3 just stay locked down despite netstat -l listing the ports as open.

Dracozny commented 1 year ago

Memory Limit set too low. Crystal Isles seems to peak around 15GB with a nominal usage closer to 11GB