bitpoke / mysql-operator

Asynchronous MySQL Replication on Kubernetes using Percona Server and Openark's Orchestrator.
https://www.bitpoke.io/docs/mysql-operator/getting-started/
Apache License 2.0
1.04k stars 276 forks source link

where is the consumer of chSource #574

Open jianhaiqing opened 4 years ago

jianhaiqing commented 4 years ago

https://github.com/presslabs/mysql-operator/blob/4d88be22e4b30c65c1a0fd60794533cbd4fe50e7/pkg/controller/orchestrator/orchestrator_controller.go#L132

It's hard to figure out why we need the event source channel here ? Is there any tips to explain the code ? I really appreciate your explanation.

AMecea commented 4 years ago

The orchestrator controller reconciles each reconcileTimePeriod seconds all clusters, see this code. In order to enqueue all those clusters we use a list of clusters that is updated based on create and delete event.

This is done in order to sync the state (cluster status) from Orchestrator into k8s and to enforce the desired state to the MySQL cluster (eg. the cluster read-only state). Otherwise, we would end up having two sources of truth to watch for, anyway, we aren't too far from it :smile:

jianhaiqing commented 4 years ago

Ok, I'm clear. One more question, why not use the the following code, it would enqueue the reconciling routine as well. Is there any difference you guys have found ?

reconcileTime := 5
reconcile.Result{RequeueAfter: time.Duration(reconcileTime) * time.Second}, nil
jianhaiqing commented 4 years ago

I have checked the reconciling process, if RequeueAfter is set, each of the cluster will be reconciled every RequeueAfter . So both of them can satisfy our expectation, is there any performance consideration ?

How would you test the performance of operator ? Do you have any clue about the performance testing, and the key metrics of operator. So far, i can get the prometheus metrics from operator, but i'm not clear about that. And the grafana should be involved to show the metrics.

What's your suggestion ?

AMecea commented 4 years ago

That logic is old, I'm not sure why we choose to do it that way. Yes, it's possible to do it as you suggested.

We've run it with around 300 clusters but it didn't do that well. It worked but it was slow in reconciling the status and pod labels. I think that the Orchestrator was the bottleneck. May I ask how many clusters do you plan to manage with a single MySQL Operator?

jianhaiqing commented 4 years ago

I would do the stress test to see how many clusters a single operator and orchestrator can handle within our toleration expectation, such as failover within 15 seconds, master service can't work properly.