Closed RIKIKU closed 2 years ago
Note: This is my first python script, so please point out anything that doesn't seem right. ihavenoideawhatimdoing.jpg
Awesome, looks good. I suppose you tested it? If possible, can you squash the 3 commits into one with `git rebase -i HEAD~3 and then just keep one and set the other 2 to squash or fixup and push force afterwards.
I've tested it as best I know how. I'm running this commit at the moment and the health endpoint is definitely working when the server does and will report unhealthy if I modify the script to hit a port the UT service isn't listening on.
The only problem here is that Docker will not restart the container when it becomes unhealthy. There's a long-running issue here that discusses why docker restarting on custom healthchecks hasn't been implemented. Should work fine in an orchestrator though. I might test the restart thing on my k3s cluster if I can get the time to upload the image somewhere.
Regarding the commits, I'll squash those if required, however, would a Squash Merge do the job?
Is there maybe a possibility that you could crash the container somehow when it is unhealthy (never tried that) so that it would automatically restart? I prefer to let a PR creator squash however he wants but I can also do it while merging of course.
Edit: What are the causes of it to be unhealty? Probably the ut process (/ut-server/ucc
) is still running so could your script just kill it in order to crash the container?
I'll see what I can do with crashing the container. The only real repeatable way i have of doing that is waiting for people to play on the server for a few maps. I'm not sure why it is crashing, the logs aren't very helpful. This PR isn't really meant to be a solution to #15, health checks are just something I think containers should do as they allow for auto healing as an option and I had a pretty good idea of how to implement that for UT.
I did try exec'ing into the container and killing the ucc-bin process and the whole container just died immediately. The trick to testing this is keeping the process running, but making it hang and stop responding to queries. I'm not sure how I can do that though. If you like, I can just leave this running in my environment for a few days to see if a group of people join the server and crash it.
I managed to get the server to crash! I just connected a player and let it sit there for a while rotating through the maps. It crashed and Docker has detected it as unhealthy. So I guess we can say that the health check is working. 😁
In regards to the commit squash, I've given it a go, but I really don't understand how to do it. It seems a fair bit more complex than just squashing during the PR merge. can we just do a squash merge from the PR?
@Roemer were there any other changes you were waiting on me for with this PR?
No it's actually fine. I merged it and a new image is being built. Thanks again!
Added health check to detect when the UT server has stopped responding and restart automatically. #15