SickHub / ark-server-charts

A helm chart for an ARK Survival Evolved Cluster
GNU General Public License v3.0
13 stars 4 forks source link

Broken mods on clustered servers using this chart #14

Closed hyperbolic2346 closed 2 years ago

hyperbolic2346 commented 2 years ago

I've seen an issue a few times now where a random subset of mods go missing for both servers in a cluster. Originally, I dismissed it. Then I thought about it more and I think it is related to one server updating while the other loads. Both servers complain about the same mods, but one downloads those mods and the other simply continues on without it.

The biggest issue here is that they load without the mods and then everything that used that mod is deleted. This was very problematic the time that s+ went missing as all buildings on the server were nuked. I think have to backtrack through backups to find the last good one. I think the updates need some sort of communication or at the least, the servers need to be staggered for updating/rebooting.

Here is the server that does the updates:

###########################################################################
# Ark Server -  Mon Aug 8 17:30:27 UTC 2022
###########################################################################
Ensuring correct permissions...
Shared server files in /arkserver...
Shared clusters files in /arkserver/ShooterGame/Saved/clusters...
Cleaning up any leftover arkmanager files...
Creating arkmanager.cfg from environment variables...
Creating crontab...
Starting cron service...
 * Starting periodic command scheduler cron
   ...done.
Loading crontab...
Save file validation is not enabled.
[No Backup On Start]
./arkserver.sh: line 149: [: -eq: unary operator expected
./arkserver.sh: line 197: [: -eq: unary operator expected
Running command 'start' for instance 'main'
[  ERROR  ]     Mod 1404697612 is requested but not installed.  Run 'arkmanager installmod 1404697612' to install this mod.
[  ERROR  ]     Mod 1967741708 is requested but not installed.  Run 'arkmanager installmod 1967741708' to install this mod.
Checking for updates before starting
Checking for update; PID: 56
The server is already stopped
Updating mod 1404697612
Mod 1404697612 updated
Updating mod 1967741708
Mod 1967741708 updated
The server is starting...

2022-08-08 17:31:06: start
2022-08-08 17:31:06: Running /arkserver/ShooterGame/Binaries/Linux/ShooterGameServer TheIsland\?RCONPort=32330\?SessionName=Minion\ Island\?GameModIds=543394681\,2724167243\,1967741708\,731604991\,1404697612\,2007400172\,2007461356\,2007447056\,2007441758\,2007430597\,2007418454\,2007411835\?MaxPlayers=20\?ServerPVE=True\?QueryPort=27018\?AltSaveDirectoryName=SavedArks\?ServerPassword=[redacted]\?ServerAdminPassword=[redacted]\?RCONEnabled=True\?Port=7870\?listen -AllowFlyerCarryPvE=True -OverrideStructurePlatformPrevention=True -clusterid=[redacted] -DisableStructureDecayPvE=True -ForceAllowCaveFlyers=True -log
2022-08-08 17:31:06: Server PID: 542
[S_API FAIL] SteamAPI_Init() failed; SteamAPI_IsSteamRunning() failed.
Setting breakpad minidump AppID = 346110
2022-08-08 17:33:50: server is up

And here is the other server that doesn't auto update on start.

###########################################################################
# Ark Server -  Mon Aug 8 17:30:20 UTC 2022
###########################################################################
Ensuring correct permissions...
Shared server files in /arkserver...
Shared clusters files in /arkserver/ShooterGame/Saved/clusters...
Cleaning up any leftover arkmanager files...
Creating arkmanager.cfg from environment variables...
Creating crontab...
Starting cron service...
 * Starting periodic command scheduler cron
   ...done.
Loading crontab...
Save file validation is not enabled.
./arkserver.sh: line 149: [: -eq: unary operator expected
[No Backup On Start]
./arkserver.sh: line 197: [: -eq: unary operator expected
Running command 'start' for instance 'main'
[  ERROR  ]     Mod 1404697612 is requested but not installed.  Run 'arkmanager installmod 1404697612' to install this mod.
[  ERROR  ]     Mod 1967741708 is requested but not installed.  Run 'arkmanager installmod 1967741708' to install this mod.
The server is starting...

2022-08-08 17:30:29: start
2022-08-08 17:30:30: Running /arkserver/ShooterGame/Binaries/Linux/ShooterGameServer Ragnarok\?RCONPort=32330\?SessionName=Minion\ Ragnarok\?GameModIds=543394681\,2724167243\,1967741708\,731604991\,1404697612\,2007400172\,2007461356\,2007447056\,2007441758\,2007430597\,2007418454\,2007411835\?MaxPlayers=20\?ServerPVE=True\?QueryPort=27017\?AltSaveDirectoryName=SavedArks\?ServerPassword=[redacted]\?ServerAdminPassword=[redacted]\?RCONEnabled=True\?Port=7880\?listen -AllowFlyerCarryPvE=True -OverrideStructurePlatformPrevention=True -clusterid=[redacted] -DisableStructureDecayPvE=True -ForceAllowCaveFlyers=True -log
2022-08-08 17:30:30: Server PID: 89
[S_API FAIL] SteamAPI_Init() failed; SteamAPI_IsSteamRunning() failed.
Setting breakpad minidump AppID = 346110
2022-08-08 17:32:37: server is up

I'm thinking the startup script should include an arkmanager checkmodupdates and compare the return code to 1. If it isn't 1, it should sleep and try again for the non-updating servers. I realize this probably is a fix down in the container level, but you don't allow issues on your container build with the changes necessary for clustering.

DrPsychick commented 2 years ago

You are right, there is a major flaw: that servers that are not responsible for updating mods start without the mods. Give me a few days take a deeper look.

hyperbolic2346 commented 2 years ago
if [ $UPDATEONSTART -eq 0 ]; then
    arkmanager start -noautoupdate --no-background --verbose &
        arkmanpid=$!
        wait $arkmanpid
else
        # wait for mods/install to complete
        until arkmanager checkupdate
        do
          echo "Waiting for ark server to update"
          sleep 10
        done
        until arkmanager checkmodupdate --revstatus
        do
          echo "Waiting for ark mods to update"
          sleep 10
        done
        arkmanager start --no-background --verbose &
        arkmanpid=$!
        wait $arkmanpid
fi

I'm going to test this. The only issue I see with it is the race to the shared filesystem. If a mod is in the middle of being updated, I'm unsure if it will indicate that it is updated or not and allow the check to pass.

DrPsychick commented 2 years ago

Great, I had the exact same thought :) Trying to get this testable/tested, but it does require actually running a server for the test.

hyperbolic2346 commented 2 years ago

Which I am doing, so I can report back once I notice mods have updated.

DrPsychick commented 2 years ago

Is there a use case when you would want to run a server with outdated mods on purpose? because this would make it impossible to do that, right?

hyperbolic2346 commented 2 years ago

I don't think that would be something you can do easily with steam, but ... maybe? We could make it a configuration option and default it.

if [ "$WAITONUPDATE" = "true" ]; then
DrPsychick commented 2 years ago

I pushed a branch with a test script function - it's commented out because it downloads the server and needs manual shutdown of the server after the test. https://github.com/DrPsychick/arkserver/tree/wait-for-mods

DrPsychick commented 2 years ago

there is a "race" condition, when Error: /home/steam/Steam/steamapps/workshop does not exist occurs, the checkmodupdate fails. I invited you to the repo so you can push directly to the branch if you want to.

DrPsychick commented 2 years ago

PS: @hyperbolic2346 send me a message to slack -(at)- drsick.net and we can chat in Slack.

hyperbolic2346 commented 2 years ago

Yeah, I didn't really like the deploy one and wait then deploy the second server setup too much. I like gitops and use flux to define my k8s cluster and using this felt very...manual. You'd need a way to verify the different stages of the server download though or at least have a way to recognize a completed server download. It may be enough to look for the mod directory and wait for it to exist.

hyperbolic2346 commented 2 years ago

there is a "race" condition, when Error: /home/steam/Steam/steamapps/workshop does not exist occurs, the checkmodupdate fails.

Oh I see. I'm hitting that now on my tests. So is the plan then to have each instance download their own mods and leave the main server files shared with only one responsible for updating? How will the second server get restarted?

hyperbolic2346 commented 2 years ago

Ok, I have investigated this some more. Arkmanager needs this directory to get the mod name, but it is only used in messages. Unless this directory is shared, I don't see how we can use arkmanager to check the update status. We could use a .lock file somewhere, but I'm unsure the best route currently.

We could have each server in the cluster responsible for mods itself and map a volume in for the workshop directory. This doesn't solve the issue of server updates though and that solution could apply for mods as well.

DrPsychick commented 2 years ago

Thanks for opening that PR with arkmanager 👍, looks like a good solution!

DrPsychick commented 2 years ago

for future reference: to test it with master of arkmanager I had to

Dockerfile: add -- --unstable

RUN curl -sL "https://raw.githubusercontent.com/arkmanager/ark-server-tools/$AMG_VERSION/netinstall.sh" | bash -s steam -- --unstable

add --build-arg AMG_BUILD=versioned --build-arg AMG_VERSION=master to the docker build command in test/run-test.sh.

DrPsychick commented 2 years ago

So I decided to split from the original docker image, because the maintainer did not react anymore and did not merge my PR.

You can try this image if that works for you : drpsychick/arkserver:latest-master, the new repo auto-builds multiple versions: https://github.com/SickHub/arkserver

DrPsychick commented 2 years ago

New image tested by deleting mods from ShooterGame/Content/Mods and using tag latest-master, the second server was waiting patiently till the one with updateOnStart=true updated the mods.

https://github.com/SickHub/arkserver/pull/2

Once you confirm, @hyperbolic2346 , I think we can close this issue. I'm watching arkmanager for the next release to update the default version in the image.

DrPsychick commented 2 years ago

So the new image build has significantly reduced image size, servers waiting for mod updates tested successfully, so I'll close this. Feel free to reopen it, if it still causes issues for you.

hyperbolic2346 commented 2 years ago

I have tested this lightly and agree it is fixed. Thank you for all your help. I will reopen if something goes wrong.