Open gecube opened 1 year ago
Hello @gecube !
It is possible to tweak the default reconciliation period for all the resources and for specific resources using something like:
# in the helm chart values.yaml
reconcile:
defaultResyncPeriod: 36000 # 10 Hours
resourceResyncPeriods:
bucket: 18000 # override the default value for a specific resource
More docs here: https://aws-controllers-k8s.github.io/community/docs/user-docs/drift-recovery/
@gecube Another way to trigger instant reconciliation would be by restarting the pod that is running the controller
@a-hilaly Thanks for the swift reply. I don't like both option. The second one needs the direct access to the cluster, which is not very appropriate. The first one sounds reasonable, but it can create an excessive load on Amazon API (I believe so). I think we need better alternative. Any suggestions?
@gecube The only left option would be to edit the CR (maybe adding a dummy annotation). However i'm not aware of any other controller that supports "triggering one reconciliation only". Do you know any? Happy to jump in and see if it's something we could support.
@a-hilaly if we can use HelmRelease from FluxCD as an example - there are two options.
flux reconcile helmrelease <name>
values:
in kind: HelmRelease
object and controller watches the changes and reapplies object asap. So shortly - there is an option to "triggering one reconciliation only"
use CLI utility like flux reconcile helmrelease
@gecube Looks like flux reconcile
CLI is just patching an annotation to the resources you're asking for their instant reconciliation. And same thing for flux reconcile helmrelease
(This is the annotation format).
Are you requesting to introduce a new ackctl
command that does this for you, in a similar way flux reconcile
works?
or I just patch values: in kind: HelmRelease object and controller watches the changes and reapplies object asap.
If you're talking about the helm release of the controller. I believe the only possible one is deployment.replicas
to 0 then back to 1
which causes the controller to restart and reconcile all the resources under it's management.
there is an option to "triggering one reconciliation only"
Natively in the kubernetes world, i believe it's not possible. But you can hack something causing the controller to restart or to edit the resources you want to reconcile. QQ: are you using helm to package the CRs and deploying them to the cluster?
@gecube We host a weekly community meeting that is open to all users/contributors, feel free to jump in to share feedback and ask more questions to the ACK team: https://github.com/aws-controllers-k8s/community#details
@a-hilaly
Looks like flux reconcile CLI is just patching an annotation to the resources you're asking for their instant reconciliation. And same thing for flux reconcile helmrelease (This is the annotation format).
It is possible scenario.
If you're talking about the helm release of the controller. I believe the only possible one is deployment.replicas to 0 then back to 1 which causes the controller to restart and reconcile all the resources under it's management.
Nope. I am talking about HelmRelease object itself. FluxCD issues. So sometimes HelmRelease is stuck after several unsuccessful retries. Then two options exist. As I said - or to use flux reconcile
cli or patch somehow HelmRelease and controller (helm controller from FluxCD) will pick up changes and re-apply resource.
Natively in the kubernetes world, i believe it's not possible. But you can hack something causing the controller to restart or to edit the resources you want to reconcile. QQ: are you using helm to package the CRs and deploying them to the cluster?
Agree. But the world is much more complex than mental model of k8s. So sometimes we need a convenient way to push changes. It is contradiction to gitops approach. But not everything could be described like gitops. For instance, backup of DB. So operators like crunchy or others make some very weird steps to allow user to make some imperative actions in declarative manifests of k8s. Regarding CR. I am using FluxCD + kustomization. So all ACK CRs are packed into catalogues with kustomization.yaml. The next step will be either to pack them into Helm charts and make a proper cross-references between the objects, either to split in different Kustomizations and make dependencies between them. I am using a dedicated cluster for management purposes so I won't overload it with many helm releases or kustomizations. Another option is to build OCI compliant bundles with manifests with the help of something like timoni
also my question is not about drift-remediation, but rather about controller behaviour - when it's not picking up the changes (for any reason). But I understand that there is no (probably) better category for it.
Issues go stale after 180d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 60d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/aws-controllers-k8s/community.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 180d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 60d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/aws-controllers-k8s/community.
/lifecycle stale
/remove-lifecycle stale
Hello!
Right now I have the next issue. I have a bunch of resources like VPC, Subnets, RouteTables. The dependency between resources are expressed with vpcRef, subnetRef etc. and other type of references. So the pipeline looks like this: apply all resources in a bunch and wait till the controller(s) will reconcile them. Unfortunately, it could take as much as 1h for the whole reconciliation. Particularly - if you are hit by limits of Amazon cloud (like exceeding quota on some of resources). So then you are writing ticket to Amazon with the request to increase quota.... and wait.... wait.... wait... till the controller(s) will reconcile.
So I am looking for some way to explain to controller(s) that resources must be reconciled right now. Probably the decrease of interval also could help, but I don't want to overload amazon api with a bunch of requests. Also I wonder if controller(s) are really watching all changes in Status fields of relevant objects.