dgkanatsios / azuregameserversscalingkubernetes

Scaling Dedicated Game Servers on Azure Kubernetes Service
MIT License
39 stars 16 forks source link

DedicatedGameServer status - what to do if failed? #56

Closed dgkanatsios closed 5 years ago

dgkanatsios commented 6 years ago

A DedicatedGameServer can signal to our API Server that it has Failed. We do nothing here other than marking the entire DedicatedGameServerCollection as Failed. We should investigate if we should do something else in this case:

We could have the user select what to do via an extra flag on the DedicatedGameServerCollection.

dgkanatsios commented 6 years ago

We should also investigate what to do if DedicatedGameServers and/or their corresponding Pods are created and then Failed. If we opt to create more DedicatedGameServers and/or Pods, maybe we should set a threshold of some kind, e.g. if 30% or more of the DedicatedGameServerCollection has failed (again, either from a DedicatedGameServer or a Pod perspective) we should stop creating more DedicatedGameServers/Pods.

dgkanatsios commented 5 years ago

We will proceed to create an enumeration on the DedicatedGameServerCollection object for what the behavior would be on subsequent failure of a DedicatedGameServer. When a DGS fails, the two options would be to either delete it or remove it from collection. Let's call this enumeration 'DGSFailBehavior' with two available options 'Remove' and 'Delete'.

Default value should be 'Remove'.

We should also add a 'DGSMaxFailures' integer variable on the DedicatedGameServerCollection. If number of failures is bigger than this threshold, we should set the Collection to an 'Failed' state and do not 'Delete' or 'Remove' any more DGS. Default value for the 'DGSFailThreshold' should be 0, i.e. if any DGS fails set the Collection to an unhealthy state and do no action. We will keep the number of failures for each DGSCollection on an variable called 'DGSTimesFailed'

dgkanatsios commented 5 years ago

Documentation is here