Open jianhaiqing opened 4 years ago
The orchestrator
controller reconciles each reconcileTimePeriod
seconds all clusters, see this code. In order to enqueue all those clusters we use a list of clusters that is updated based on create
and delete
event.
This is done in order to sync the state (cluster status) from Orchestrator into k8s and to enforce the desired state to the MySQL cluster (eg. the cluster read-only state). Otherwise, we would end up having two sources of truth to watch for, anyway, we aren't too far from it :smile:
Ok, I'm clear. One more question, why not use the the following code, it would enqueue the reconciling routine as well. Is there any difference you guys have found ?
reconcileTime := 5
reconcile.Result{RequeueAfter: time.Duration(reconcileTime) * time.Second}, nil
I have checked the reconciling process, if RequeueAfter is set, each of the cluster will be reconciled every RequeueAfter . So both of them can satisfy our expectation, is there any performance consideration ?
How would you test the performance of operator ? Do you have any clue about the performance testing, and the key metrics of operator. So far, i can get the prometheus metrics from operator, but i'm not clear about that. And the grafana should be involved to show the metrics.
What's your suggestion ?
That logic is old, I'm not sure why we choose to do it that way. Yes, it's possible to do it as you suggested.
We've run it with around 300 clusters but it didn't do that well. It worked but it was slow in reconciling the status and pod labels. I think that the Orchestrator was the bottleneck. May I ask how many clusters do you plan to manage with a single MySQL Operator?
I would do the stress test to see how many clusters a single operator and orchestrator can handle within our toleration expectation, such as failover within 15 seconds, master service can't work properly.
https://github.com/presslabs/mysql-operator/blob/4d88be22e4b30c65c1a0fd60794533cbd4fe50e7/pkg/controller/orchestrator/orchestrator_controller.go#L132
It's hard to figure out why we need the event source channel here ? Is there any tips to explain the code ? I really appreciate your explanation.