Closed craftyc0der closed 2 years ago
That is a very good question. We have to break down the answers:
The controller has already a high QPS that will allow way more requests to the K8S API than usual. I am following the same approach as applications like Prometheus while communicating with K8S API. Though, I didn't have the chance to test the ingress controller with a huge Fleet of GS. Unfortunately, I don't have the cloud resources available. What I can say is that I would not expect issues in terms of scalability due to the nature of this application. The reconcile process is pretty simple and it is only creating a couple of resources. Not much business logic involved or external calls to DBs, APIs or external services. That said, I have a few tricks under my sleeve to improve a bit the Reconcile process. That means, splitting the OnAdd and OnUpdate into 2 different queues/channels and having 2 types of workers dealing with messages. Additionally, as an example of another application following similar pattern is Agones itself. There is only one single controller available and expecting that K8S will guarantee that this replica is always up and running.
Changing the Deployment manifest of the gameserver ingress controller and increasing the Replicas > 1 will not make much difference. Instead, it will put extra load on the API. That is because each controller has its own internal cache/watcher subscription.
I hope that all makes sense and let me know if you need any other information.
Feel free to close the issue if that answer your question.
It makes perfect sense. I've been hacking on this code base for a while. I am quite familiar with it as a result.
Appreciate the thoughts. Happy Turkey Day!
If we want to rely on the service being online and have k8s replace it if its down, should we consider a very effective health check?
Great suggestion. I can add the health checks.
It will be part of the release 0.1.5
I have implemented LeaderElection. I'll test it next week in my cloud environment next week. Minikube is working nicely. Pretty straight forward really. My complications were getting RBAC just right for my rather complicated CICD pipeline.
Cool, I can also give it a try this weekend using the built in support from Controller Runtime.
Let's say I wanted to deploy multiple replicas of the controller for scalability and redundancy. What design method would you use for this? The watcher is going to alert each GameServer handler. Do we create a simple mutex to prevent duplication of efforts? This seems like a standard problem. Is there a standard solution?