Closed dsmith111 closed 1 year ago
@dgkanatsios In the excel file the description for this issue was:
We should add some performance metrics to controllers, like how much time it takes to reconcile, how many servers were processed etc. We should also include the relevant Geneva charts.
Did you mean Grafana charts, or should this connect to an existing (or to be built) Geneva/Jarvis system?
@dsmith111 thank you for the PR! The charts were mean to be Grafana, which aligns with your PR, thanks!
A couple of comments, then I think we're good to merge! Appreciate all the work and the communication, thank you!
Also FYI @abbasahmed and @ghov since they have #414 open with changes to the Grafana dashboards. You'll need to rebase after we merge this one.
@dgkanatsios I believe that's all of the current comments wrapped up
Really appreciate all the effort and the discussion, will merge as soon as tests pass! Thank you!
Problem: Currently we do not have any metrics in place to track the performance of controlling/reconciling GameServers: #361
Solution: This PR adds in 2 new Prometheus metrics:
GameServerReachedInitializingDuration The 5-minute average time to reach Initialization from all new GameServers.
GameServerReachedStandingByDuration The 5-minute average time to reach StandBy from all new GameServers.
These new metrics can potentially show any issues in the controller itself (time taken to begin GameServer creation) as well as issues related to the servers themselves, or performance of the cluster (time taken to complete GameServer initialization).
Testing used to verify the change: I created a temporary custom Docker image to build the thundernetes controller manager; using the netcore sample GameServerBuild, I proceeded to:
All of these events were monitored within the modified Grafana dashboard: