SickHub / ark-server-charts

A helm chart for an ARK Survival Evolved Cluster
GNU General Public License v3.0
13 stars 4 forks source link

chown: changing ownership of '/arkserver/ShooterGame/Saved': Operation not permitted #25

Closed bhechinger closed 1 year ago

bhechinger commented 1 year ago

Trying to deploy this into my kubernetes cluster I get this error at pod startup:

➜ kubectl -n ark logs ark-test-ark-cluster-theisland-5f87ff7798-dz52k
###########################################################################
# Ark Server -  Sun Jan 29 13:29:48 UTC 2023
###########################################################################
Ensuring correct permissions...
Shared server files in /arkserver...
chown: changing ownership of '/arkserver/ShooterGame/Saved': Operation not permitted

I have this in values.yaml:

persistence:
  enabled: true

  # game files from steam, the largest volume, includes installed mods
  game:
    storageClass: ceph-block
    accessModes:
      - ReadWriteOnce
    size: 30Gi
    mountPath: /arkserver

  # shared cluster files
  cluster:
    storageClass: ceph-block
    accessModes:
      - ReadWriteOnce
    size: 200Mi
    mountPath: /arkserver/ShooterGame/Saved/clusters

  # contains the world save game and configuration files
  # keeping a backup of this is enough to get your server back up
  save:
    volumeMode: Filesystem
    storageClass: ceph-block
    accessModes:
      - ReadWriteOnce
    size: 2Gi
    mountPath: /arkserver/ShooterGame/Saved

edit: added the d which is actually there and got lost in the copy/paste.

I have been completely unable to figure out why this is happening or how to fix it. :(

➜ helm version                                                   
version.BuildInfo{Version:"v3.11.0", GitCommit:"472c5736ab01133de504a826bd9ee12cbe4e7904", GitTreeState:"clean", GoVersion:"go1.18.10"}
➜ kubectl version --short
Client Version: v1.26.0
Kustomize Version: v4.5.7
Server Version: v1.25.2+k3s1
➜ helm -n ark list                                               
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
ark-test        ark             1               2023-01-29 13:24:31.70050347 +0000 UTC  deployed        ark-cluster-0.1.10      latest
DrPsychick commented 1 year ago

Are you missing a d here? mountPath: /arkserver/ShooterGame/Save

I know this name is awkward, but that's how it is named for ARK.

bhechinger commented 1 year ago

Ah, no, sorry. Copy-pasta error. The d is there.

edit: Plus, that would be a "not found" error if that were the case I believe.

DrPsychick commented 1 year ago

Found a bug... will fix it shortly. https://github.com/SickHub/arkserver/blob/7e0d3ed5e94db95c5627ce64db4c58ace3a328f7/run.sh#L34 sudo is missing.

bhechinger commented 1 year ago

Ah ha! I look forward to the fixed version.

bhechinger commented 1 year ago

While I've got your attention I had a quick question. What is the reasoning behind setting the replicas to 0 on chart deployment? Does it harm things to have it set that to 1?

DrPsychick commented 1 year ago

done, please try with the new image and if this works, you can close the issue.

Thanks for raising this issue! This was also part of #22 I believe.

bhechinger commented 1 year ago

It's currently downloading the bits from steam, so looking good. Thanks!

One thing I see in the logs that isn't a show stopper but may be an issue is this:

touch: cannot touch '/arkserver/.startAfterUpdate-main': Permission denied

Not sure how important that is.

bhechinger commented 1 year ago

New issue. This happens after it's fetched everything.

2023-01-29 15:54:06: start
2023-01-29 15:54:06: Running /arkserver/ShooterGame/Binaries/Linux/ShooterGameServer TheIsland\?ServerPassword=password\?GameModIds\?MaxPlayers=10\?RCONEnabled=True\?ServerAdminPassword=password\?AltSaveDirectoryName=SavedArks\?SessionName=TheIsland\?QueryPort=27015\?Port=7777\?RCONPort=32330\?GameModIds\?listen -clusterid=arkcluster -log
/usr/local/bin/arkmanager: line 1333: /arkserver/ShooterGame/Binaries/Linux/ShooterGameServer: No such file or directory
2023-01-29 15:54:06: Server PID: 212
2023-01-29 15:54:11: Bad PID ''; expected '212'
2023-01-29 15:54:11: exited with status 0

The logs lines from the end of fetching (the container restarted) are this:

 Update state (0x61) downloading, progress: 89.92 (16928167224 / 18825081206)
 Update state (0x61) downloading, progress: 90.27 (16994272563 / 18825081206)
 Update state (0x61) downloading, progRunning command 'broadcast' for instance 'main'
[  WARN  ]      Your ARK server exec could not be found.
Error connecting to server: Connection refused at -e line 33.
DrPsychick commented 1 year ago

This is how it should look:

[...]
 Update state (0x61) downloading, progress: 33.22 (1456428304 / 4383962465)
 Update state (0x61) downloading, progress: 35.93 (1575021540 / 4383962465)
 Update state (0x41) staging, progress: 38.70 (1696640487 / 4383962465)
 Update state (0x41) staging, progress: 41.74 (1829909093 / 4383962465)
 Update state (0x41) staging, progress: 44.27 (1940625932 / 4383962465)
 Update state (0x41) staging, progress: 49.48 (2169270810 / 4383962465)
 Update state (0x41) staging, progress: 55.61 (2437708203 / 4383962465)
 Update state (0x41) staging, progress: 63.57 (2786884011 / 4383962465)
 Update state (0x41) staging, progress: 68.66 (3010230699 / 4383962465)
 Update state (0x41) staging, progress: 77.20 (3384572331 / 4383962465)
 Update state (0x41) staging, progress: 88.16 (3864998351 / 4383962465)
 Update state (0x41) staging, progress: 98.29 (4309092820 / 4383962465)
 Update state (0x81) verifying update, progress: 0.30 (13181278 / 4383962465)
 Update state (0x81) verifying update, progress: 8.09 (354592187 / 4383962465)
 Update state (0x81) verifying update, progress: 14.63 (641493812 / 4383962465)
 Update state (0x81) verifying update, progress: 21.70 (951378973 / 4383962465)
 Update state (0x81) verifying update, progress: 27.89 (1222749036 / 4383962465)
 Update state (0x81) verifying update, progress: 34.68 (1520347766 / 4383962465)
 Update state (0x81) verifying update, progress: 40.59 (1779305051 / 4383962465)
 Update state (0x81) verifying update, progress: 47.20 (2069445133 / 4383962465)
 Update state (0x81) verifying update, progress: 49.23 (2158197401 / 4383962465)
 Update state (0x81) verifying update, progress: 54.10 (2371842395 / 4383962465)
 Update state (0x81) verifying update, progress: 63.62 (2789095178 / 4383962465)
 Update state (0x81) verifying update, progress: 77.40 (3393074954 / 4383962465)
 Update state (0x81) verifying update, progress: 84.67 (3711842058 / 4383962465)
 Update state (0x81) verifying update, progress: 84.79 (3717084938 / 4383962465)
 Update state (0x81) verifying update, progress: 93.69 (4107520274 / 4383962465)
 Update state (0x101) committing, progress: 0.00 (0 / 4383962465)
 Update state (0x101) committing, progress: 0.00 (0 / 4383962465)
 Update state (0x101) committing, progress: 100.00 (4383962465 / 4383962465)
Success! App '376030' fully installed.
Update to 10405504 complete
The server is starting...
bhechinger commented 1 year ago

Hmm, I wonder what's gone wrong then. I'll try re-deploying it.

bhechinger commented 1 year ago

Nope, same behavior. It's stuck in an infinite loop of running the downloader/installer which fails with:

 Update state (0x61) downloading, progress: 88.65 (16688350988 / 18825081206)
 Update state (0x61) downloading, progress: 89.09 (16771282187 / Running command 'broadcast' for instance 'main'
[  WARN  ]      Your ARK server exec could not be found.
Error connecting to server: Connection refused at -e line 33.

The pod restarts and it fails with:

➜ kubectl -n ark logs -f ark-test-ark-cluster-theisland-577947495b-zmdg8
###########################################################################
# Ark Server -  Sun Jan 29 16:54:50 UTC 2023
###########################################################################
Ensuring correct permissions...
Shared server files in /arkserver...
Shared clusters files in /arkserver/ShooterGame/Saved/clusters...
Cleaning up any leftover arkmanager files...
Creating arkmanager.cfg from environment variables...
Creating crontab...
Starting cron service...
 * Starting periodic command scheduler cron
   ...done.
Loading crontab...
Save file validation is not enabled.
Backup on start is not enabled.
Running command 'start' for instance 'main'
[  WARN  ]      Your ARK server exec could not be found.
touch: cannot touch '/arkserver/.startAfterUpdate-main': Permission denied
Checking for updates before starting
Checking for update; PID: 47
sed: can't read /arkserver/steamapps/appmanifest_376030.acf: No such file or directory
The server is already stopped
Performing ARK updateExecuting /usr/games/steamcmd +@NoPromptForPassword 1 +force_install_dir /arkserver +login anonymous +app_update 376030 +quit
Redirecting stderr to '/home/steam/.local/share/Steam/logs/stderr.txt'
[  0%] Checking for available updates...
[----] Verifying installation...
Steam Console Client (c) Valve Corporation - version 1669935972
-- type 'quit' to exit --
Loading Steam API...OK
"@NoPromptForPassword" = "1"

Connecting anonymously to Steam Public...OK
Waiting for client config...OK
Waiting for user info...OK
 Update state (0x3) reconfiguring, progress: 0.00 (0 / 0)
 Update state (0x3) reconfiguring, progress: 0.00 (0 / 0)
 Update state (0x3) reconfiguring, progress: 0.00 (0 / 0)
 Update state (0x5) verifying install, progress: 29.24 (5503812107 / 18825081206)
Error! App '376030' state is 0x202 after update job.
Update to  complete
The server is starting...

2023-01-29 16:55:28: start
2023-01-29 16:55:28: Running /arkserver/ShooterGame/Binaries/Linux/ShooterGameServer TheIsland\?ServerPassword=password\?GameModIds\?MaxPlayers=10\?RCONEnabled=True\?ServerAdminPassword=password\?AltSaveDirectoryName=SavedArks\?SessionName=TheIsland\?QueryPort=27015\?Port=7777\?RCONPort=32330\?GameModIds\?listen -clusterid=arkcluster -log
/usr/local/bin/arkmanager: line 1333: /arkserver/ShooterGame/Binaries/Linux/ShooterGameServer: No such file or directory
2023-01-29 16:55:28: Server PID: 211
2023-01-29 16:55:33: Bad PID ''; expected '211'
2023-01-29 16:55:33: exited with status 0

Restarts and goes back to the installer which fails, etc.

This is my values file: https://github.com/bhechinger/argo-helm-wrappers/blob/1a84b97fd25c5135d4be22fdd0cb8dbfe5c4fb75/charts/ark/values-raw.yaml

bhechinger commented 1 year ago

It never finishes downloading. Could a probe be restarting it because it's taking too long?

DrPsychick commented 1 year ago

Yes that could be, depending on the download speed, try increasing the startupProbe initialDelaySeconds. kubectl get events -w should also show if that is the reason it's being aborted.

bhechinger commented 1 year ago

Well, that definitely helped with it not completing the download, however, it's still not quite working.

 Update state (0x101) committing, progress: 99.34 (18700301218 / 18825081206)
Success! App '376030' fully installed.
Update to  complete
The server is starting...

2023-01-29 20:44:54: start
2023-01-29 20:44:54: Running /arkserver/ShooterGame/Binaries/Linux/ShooterGameServer TheIsland\?ServerPassword=password\?GameModIds\?MaxPlayers=10\?RCONEnabled=True\?ServerAdminPassword=password\?AltSaveDirectoryName=SavedArks\?SessionName=TheIsland\?QueryPort=27015\?Port=7777\?RCONPort=32330\?GameModIds\?listen -clusterid=arkcluster -log
/usr/local/bin/arkmanager: line 1333: /arkserver/ShooterGame/Binaries/Linux/ShooterGameServer: No such file or directory
2023-01-29 20:44:54: Server PID: 266
2023-01-29 20:44:59: Bad PID ''; expected '266'
2023-01-29 20:44:59: exited with status 0

Then we're back to the same infinite loop as before.

DrPsychick commented 1 year ago

while the server is starting up/updating, can you open a shell in the container and check the filesystem? The file /arkserver/ShooterGame/Binaries/Linux/ShooterGameServer must be there after the update is complete. Also check if permissions of /arkserver are correct (steam:steam).

bhechinger commented 1 year ago

/arkserver is root:root /arkserver/ShooterGame is steam:steam as is everything under it.

It restarted as I was waiting to see what would happen. 10 minutes wasn't enough to finish downloading. :(

I've set it to 30 minutes but I won't get back to this until the morning.

Thanks for all your help!

bhechinger commented 1 year ago

Actually got a chance to watch this before tomorrow. It's never creating /arkserver/ShooterGame/Binaries/Linux/ShooterGameServer

DrPsychick commented 1 year ago

What I can offer is that we look on it together. Half an hour spent in a meet might lead to a fast solution than chatting over days. Poke me on slack -at- drsick.net to get an invite to my slack space or to setup a meet directly.

DrPsychick commented 1 year ago

We found that the root folder of the mounted PVC (ceph) needs to be explicitly mounted as the steam user:

securityContext:
  fsGroup: 1000
Dracozny commented 1 year ago

I actually cheated on that when I was using ceph and used a pod to mount as root and then explicitly chown to 1000:1000. it's odd that fsgroup didn't work for me. But I suspect that was in the deployment stack in my case.