aws-controllers-k8s / community

AWS Controllers for Kubernetes (ACK) is a project enabling you to manage AWS services from Kubernetes
https://aws-controllers-k8s.github.io/community/
Apache License 2.0
2.39k stars 253 forks source link

Need a convenient way to reconcile resource #1905

Open gecube opened 1 year ago

gecube commented 1 year ago

Hello!

Right now I have the next issue. I have a bunch of resources like VPC, Subnets, RouteTables. The dependency between resources are expressed with vpcRef, subnetRef etc. and other type of references. So the pipeline looks like this: apply all resources in a bunch and wait till the controller(s) will reconcile them. Unfortunately, it could take as much as 1h for the whole reconciliation. Particularly - if you are hit by limits of Amazon cloud (like exceeding quota on some of resources). So then you are writing ticket to Amazon with the request to increase quota.... and wait.... wait.... wait... till the controller(s) will reconcile.

So I am looking for some way to explain to controller(s) that resources must be reconciled right now. Probably the decrease of interval also could help, but I don't want to overload amazon api with a bunch of requests. Also I wonder if controller(s) are really watching all changes in Status fields of relevant objects.

a-hilaly commented 1 year ago

Hello @gecube !

It is possible to tweak the default reconciliation period for all the resources and for specific resources using something like:

# in the helm chart values.yaml
reconcile:
  defaultResyncPeriod: 36000 # 10 Hours
  resourceResyncPeriods:
    bucket: 18000 # override the default value for a specific resource

More docs here: https://aws-controllers-k8s.github.io/community/docs/user-docs/drift-recovery/

a-hilaly commented 1 year ago

@gecube Another way to trigger instant reconciliation would be by restarting the pod that is running the controller

gecube commented 1 year ago

@a-hilaly Thanks for the swift reply. I don't like both option. The second one needs the direct access to the cluster, which is not very appropriate. The first one sounds reasonable, but it can create an excessive load on Amazon API (I believe so). I think we need better alternative. Any suggestions?

a-hilaly commented 1 year ago

@gecube The only left option would be to edit the CR (maybe adding a dummy annotation). However i'm not aware of any other controller that supports "triggering one reconciliation only". Do you know any? Happy to jump in and see if it's something we could support.

gecube commented 1 year ago

@a-hilaly if we can use HelmRelease from FluxCD as an example - there are two options.

  1. use CLI utility like flux reconcile helmrelease <name>
  2. or I just patch values: in kind: HelmRelease object and controller watches the changes and reapplies object asap.

So shortly - there is an option to "triggering one reconciliation only"

a-hilaly commented 1 year ago

use CLI utility like flux reconcile helmrelease

@gecube Looks like flux reconcile CLI is just patching an annotation to the resources you're asking for their instant reconciliation. And same thing for flux reconcile helmrelease (This is the annotation format).

Are you requesting to introduce a new ackctl command that does this for you, in a similar way flux reconcile works?

or I just patch values: in kind: HelmRelease object and controller watches the changes and reapplies object asap.

If you're talking about the helm release of the controller. I believe the only possible one is deployment.replicas to 0 then back to 1 which causes the controller to restart and reconcile all the resources under it's management.

there is an option to "triggering one reconciliation only"

Natively in the kubernetes world, i believe it's not possible. But you can hack something causing the controller to restart or to edit the resources you want to reconcile. QQ: are you using helm to package the CRs and deploying them to the cluster?

a-hilaly commented 1 year ago

@gecube We host a weekly community meeting that is open to all users/contributors, feel free to jump in to share feedback and ask more questions to the ACK team: https://github.com/aws-controllers-k8s/community#details

gecube commented 1 year ago

@a-hilaly

Looks like flux reconcile CLI is just patching an annotation to the resources you're asking for their instant reconciliation. And same thing for flux reconcile helmrelease (This is the annotation format).

It is possible scenario.

If you're talking about the helm release of the controller. I believe the only possible one is deployment.replicas to 0 then back to 1 which causes the controller to restart and reconcile all the resources under it's management.

Nope. I am talking about HelmRelease object itself. FluxCD issues. So sometimes HelmRelease is stuck after several unsuccessful retries. Then two options exist. As I said - or to use flux reconcile cli or patch somehow HelmRelease and controller (helm controller from FluxCD) will pick up changes and re-apply resource.

Natively in the kubernetes world, i believe it's not possible. But you can hack something causing the controller to restart or to edit the resources you want to reconcile. QQ: are you using helm to package the CRs and deploying them to the cluster?

Agree. But the world is much more complex than mental model of k8s. So sometimes we need a convenient way to push changes. It is contradiction to gitops approach. But not everything could be described like gitops. For instance, backup of DB. So operators like crunchy or others make some very weird steps to allow user to make some imperative actions in declarative manifests of k8s. Regarding CR. I am using FluxCD + kustomization. So all ACK CRs are packed into catalogues with kustomization.yaml. The next step will be either to pack them into Helm charts and make a proper cross-references between the objects, either to split in different Kustomizations and make dependencies between them. I am using a dedicated cluster for management purposes so I won't overload it with many helm releases or kustomizations. Another option is to build OCI compliant bundles with manifests with the help of something like timoni

gecube commented 1 year ago

also my question is not about drift-remediation, but rather about controller behaviour - when it's not picking up the changes (for any reason). But I understand that there is no (probably) better category for it.

ack-bot commented 6 months ago

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

gecube commented 6 months ago

/remove-lifecycle stale

ack-bot commented 1 week ago

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

gecube commented 1 week ago

/remove-lifecycle stale