GoogleCloudPlatform / metacontroller

Lightweight Kubernetes controllers as a service
https://metacontroller.app/
Apache License 2.0
791 stars 104 forks source link

[feature request] multiple metacontroller with object sharding strategy #190

Open hypergig opened 5 years ago

hypergig commented 5 years ago

moving the slack discussion over to here to continue in a more formal manner

Obligatory

Metacontroller is freaking great, thank you for enabling us to build custom controllers in a matter of days.

The Problem

As our cluster grows in scale, we are noticing metacontroller isn't able to keep up during large volume and/or volatile events such as a major deployment, or new cluster provision. Metacontroller is responsible for about 2500 objects at this point, and the time it takes for all the "update loop"s to resolve could be about 20-30 mins. This is especially problematic for parents whose children may be conditional on the state of other children and/or objects. As these would require at least two "update loops". Metacontoller and the webhooks are in no way resource constrained, never going above 200m, and memory usage is negligible.

~The~ A Solution

Clearly we don't want to break metacontroller's simple interactions with the cluster and users. Cluster scoped controller objects, and backwards compatible are really important. In essence, there is a cluster scoped pool of work, and the idea is to safely parallelize the processing of that pool across an arbitrary number of workers. Keeping that in mind, I propose the following:

Challenges

Other nice side effects

arielb135 commented 5 years ago

Hi @hypergig , very good suggestion! i'd try to move the request to https://github.com/AmitKumarDas/metac - it's a fork as this project is no longer maintained.

about HPA with stateful sets, it's possible - but a graceful termination is required to free resources and actually clean the state well (for example, perform removal from cluster, or de-provision a volume).. there's a nice article of a graceful shutdown: https://medium.com/@marko.luksa/graceful-scaledown-of-stateful-apps-in-kubernetes-2205fc556ba9

If you don't need to scale down a specific shard (let's say, you have 3 shards, but now shard #1 is irrelevant) - then a graceful shutdown is not really necessary and regular HPA is fine.

about challenge #1 - 3 thoughts:

  1. why does this matter? even if you have 2 replicas that handle the same object - only one of the replicas should grab it and perform the update - i'm not 100% sure though). if not, just from top of my head, a shared queue per shard can be used so each replica can read from it, worst case (if 2 replicas are working on the same object) - only one will catch the update.

  2. maybe it doesn't really matter, as you're usually syncing to the end state of the object, i don't really see how 2 instances can grab 2 different versions of the same object, as you'd need to deploy twice super fast (while scaling the controller).

  3. because you're switching to stateful set, you can basically notify all shards about the new number of replicas while a new shard is initializing, before it starts to process. just open an API that shard-3 will call (shard-0, shard-1, shard-2) and notify them about the changes in deployment size. each one that is called stops processing. when all shards are updated - notify that they can continue, it's basically some kind of "resharding" strategy. you can also have some timeout in case of errors to rollback, etc... there might be better approaches (https://medium.com/harmony-one/understanding-harmonys-cuckoo-rule-for-resharding-215766f4ca50), but you can have freedom there because you control and know exactly when a scale up happens