PlayFab / thundernetes

Thundernetes makes it easy to run your game servers on Kubernetes
https://playfab.github.io/thundernetes
Apache License 2.0
313 stars 47 forks source link

Log properly if Pod has been evicted #385

Closed dgkanatsios closed 2 years ago

dgkanatsios commented 2 years ago

Currently, the GameServer controller does not track properly when a Pod has been evicted (e.g. due to pressure in the Node). When the Pod is evicted, the container status is Failed and the Reason is Evicted. We should log it appropriately and emit a metric.

Relevant code: https://github.com/PlayFab/thundernetes/blob/544eca21b35a387692fd50ad45ae52e226dfa96c/pkg/operator/controllers/gameserver_controller.go#L178

dgkanatsios commented 2 years ago

Unfortunately this doesn't seem to be the case. Trying manual evictions gives the following:

12s         Normal    GameServerProcessExited   gameserver/gameserverbuild-sample-netcore-42kwd   GameServer process exited with code 0 and reason Completed v1.PodStatus{Phase:"Succeeded", Conditions:[]v1.PodCondition{v1.PodCondition{Type:"Initialized", Status:"True", LastProbeTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), LastTransitionTime:time.Date(2022, time.September, 25, 3, 34, 3, 0, time.Local), Reason:"PodCompleted", Message:""}, v1.PodCondition{Type:"Ready", Status:"False", LastProbeTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), LastTransitionTime:time.Date(2022, time.September, 25, 3, 36, 11, 0, time.Local), Reason:"PodCompleted", Message:""}, v1.PodCondition{Type:"ContainersReady", Status:"False", LastProbeTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), LastTransitionTime:time.Date(2022, time.September, 25, 3, 36, 11, 0, time.Local), Reason:"PodCompleted", Message:""}, v1.PodCondition{Type:"PodScheduled", Status:"True", LastProbeTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), LastTransitionTime:time.Date(2022, time.September, 25, 3, 34, 2, 0, time.Local), Reason:"", Message:""}}, Message:"", Reason:"", NominatedNodeName:"", HostIP:"172.18.0.2", PodIP:"10.244.3.33", PodIPs:[]v1.PodIP{v1.PodIP{IP:"10.244.3.33"}}, StartTime:time.Date(2022, time.September, 25, 3, 34, 2, 0, time.Local), InitContainerStatuses:[]v1.ContainerStatus{v1.ContainerStatus{Name:"initcontainer", State:v1.ContainerState{Waiting:(*v1.ContainerStateWaiting)(nil), Running:(*v1.ContainerStateRunning)(nil), Terminated:(*v1.ContainerStateTerminated)(0xc00031a850)}, LastTerminationState:v1.ContainerState{Waiting:(*v1.ContainerStateWaiting)(nil), Running:(*v1.ContainerStateRunning)(nil), Terminated:(*v1.ContainerStateTerminated)(nil)}, Ready:true, RestartCount:0, Image:"docker.io/library/thundernetes-initcontainer:36d21df", ImageID:"sha256:d681c5a6342f01a1686b83c10fb28e7da73c297ae874eb557f06e15cf4e3e955", ContainerID:"containerd://a6732cef26cf19d36ce5ee0ac4053e0506dc07ca4fb20f48bea29fd3ff7759d8", Started:(*bool)(nil)}}, ContainerStatuses:[]v1.ContainerStatus{v1.ContainerStatus{Name:"thundernetes-sample-netcore", State:v1.ContainerState{Waiting:(*v1.ContainerStateWaiting)(nil), Running:(*v1.ContainerStateRunning)(nil), Terminated:(*v1.ContainerStateTerminated)(0xc00031a8c0)}, LastTerminationState:v1.ContainerState{Waiting:(*v1.ContainerStateWaiting)(nil), Running:(*v1.ContainerStateRunning)(nil), Terminated:(*v1.ContainerStateTerminated)(nil)}, Ready:false, RestartCount:0, Image:"ghcr.io/playfab/thundernetes-netcore:0.5.0", ImageID:"ghcr.io/playfab/thundernetes-netcore@sha256:a65af58caec93940e263cf85e669a4925e110bc2b3d1e5565ec2ab13643e3fe4", ContainerID:"containerd://8a30b30a42fda5a7ae4859d1b25c9f9a97c3607249359b6e8837179e11ba4f46", Started:(*bool)(0xc000ce3445)}}, QOSClass:"Burstable", EphemeralContainerStatuses:[]v1.ContainerStatus(nil)}

"Evicted" is listed in the kubectl get events but not sure how to parse it using the API. At the same time, controller detects that the Pod is not ready so it will delete it so impact is only on the reporting side. Closing till we find a better way.