Closed sschepens closed 1 month ago
We do split the total resources into multiple works https://github.com/Azure/fleet/blob/9d2d0ea03deec6535322d654fccfbd4f480f2132/pkg/controllers/workgenerator/controller.go#L316
I wonder if you have observed that we only create one work even when there are many resourceSnapshots?
Yes, there are duplications and there can be more than 3 copies, but they all serve different purposes. The snapshot is needed to allow users to have a planned rollout any changes instead of blast the latest to every cluster. Also, this allows the system to enforce the rule of "IgnoreDuringExecution" that we don't move a resource around when there is any change. For example, if you have a deployment and you change a parameter of one of the containers, we won't move it to a different set of clusters.
The work object is a per cluster copy instead of cluster scoped object which means the number of copies is proportional to the number of clusters you want to place. There are many reason for this approach. One of the reason is to support https://github.com/Azure/fleet/tree/main/docs/concepts/Override so that each member cluster can get a customized version. The other reason is https://github.com/Azure/fleet/blob/main/docs/concepts/ClusterResourcePlacement/README.md#envelope-object.
With that said, I wonder if you have any use case that the ETCD space is a real concern?
Describe the bug
When creating a
ClusterResourcePlacement
selecting a Namespace, a singleWork
object is created with all the resources of the Namespace, and this can cause it's size to exceed Kubernetes limits.ClusterResourceSnapshot
currently bypasses this by creating several resources, why isn't this done toWork
as well?On the other hand, it would seem that there is a lot of overhead currently in
fleet
,ClusterResourceSnapshots
store the whole manifests of the selected resources, thenWork
stores them again, this increases storage usage by 3x. So a couple a questions come to my mind: