aws-controllers-k8s / community

AWS Controllers for Kubernetes (ACK) is a project enabling you to manage AWS services from Kubernetes
https://aws-controllers-k8s.github.io/community/
Apache License 2.0
2.41k stars 253 forks source link

`Status.Conditions` not being cleared inside AdoptionReconciler #1168

Open vijtrip2 opened 2 years ago

vijtrip2 commented 2 years ago

ACK Reconciler inside reconciler.go clears all the Status.Conditions at the start of reconciliation loop and calculates those Conditions again. This is the correct approach but missing from adoption_reconciler.go.

To fix this AdoptionReconciler should also clear all Conditions at the beginning of adoption reconciliation loop and if there was any reconciliation logic dependency on existing Conditions(ex: ACK.Adopted), find another way to execute that logic. Ex: Annotations or Checking adopted Custom Resource etc etc

jaypipes commented 2 years ago

I don't quite understand what the problem is here. Please update the title of this issue to reflect what the actual issue is.

If I understand correctly, what SageMaker team is looking for is the historical record of a resource being adopted by ACK. As mentioned elsewhere, Status.Conditions is not a historical record of all state transitions for a resource. Instead, Status.Conditions is the set of conditions that were observed by the controller in the last reconciliation run (which is the reason why the Status.Conditions is cleared at the beginning of each reconciliation loop).

vijtrip2 commented 2 years ago

Status.Conditions is the set of conditions that were observed by the controller in the last reconciliation run (which is the reason why the Status.Conditions is cleared at the beginning of each reconciliation loop).

Exactly and this is missing from AdoptionReconciler. AdoptionReconciler does not clear the conditions at beginning of reconciler loop.

surajkota commented 2 years ago

Since the conditions will be cleared on every reconcile loop in future, what is the new mechanism for a user to know if the resource adoption is complete? and the AdoptedResource CR can be cleaned up

(not visible to user) adopted resource reconciler also needs a new implementation to determine if this reconcile loop is a no-op or it needs to create the target resource

I propose to introduce a field under status.

jaypipes commented 2 years ago

Since the conditions will be cleared on every reconcile loop in future, what is the new mechanism for a user to know if the resource adoption is complete? and the AdoptedResource CR can be cleaned up

@surajkota the mechanism is actually the same as it used to be: the custom resource exists with the referred-to AWS identifier. @RedbackThomson do you think we should add something to the adopted resource reconciler that deletes the AdoptedResource CR when it successfully creates the target "real" CR?

(not visible to user) adopted resource reconciler also needs a new implementation to determine if this reconcile loop is a no-op or it needs to create the target resource

I propose to introduce a field under status.

@surajkota you mean a new field in the AdoptedResource's Status, right?

surajkota commented 2 years ago

you mean a new field in the AdoptedResource's Status, right?

yes

the mechanism is actually the same as it used to be: the custom resource exists with the referred-to AWS identifier.

previously, we relied on ACK.Adopted condition to be set to true before checking if target CR exists. I think this mechanism is more user friendly because AdoptedResource reconciler would show an error if user specified a conflicting name for target CR(i.e. a CR with same name already exists). We can change it to some other field under AdoptedResource's Status

RedbackThomson commented 2 years ago

previously, we relied on ACK.Adopted condition to be set to true before checking if target CR exists

I think this is still the way to go. We can calculate this on every reconciliation loop by searching for the target CR and checking that it has the adoption annotation (so that we know it was not created manually).

@RedbackThomson do you think we should add something to the adopted resource reconciler that deletes the AdoptedResource CR when it successfully creates the target "real" CR?

I don't see a whole lot of value to it? AdoptedResource acts as the declaration that our controller should adopted an existing resource. It's status shows whether we are in that state (ACK.Adopted = true) or whether there was an error achieving that (ACK.Adopted = true && ACK.Recoverable = <something>). If the customer doesn't want something to be adopted anymore, it's up to them to remove the declaration?

I do understand, though, that adoption will probably be used imperatively ie. a customer will adopt a resource manually, once, after creating the cluster. So maybe there is value if we view it like that?

ack-bot commented 2 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

ack-bot commented 2 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle rotten

ack-bot commented 2 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Provide feedback via https://github.com/aws-controllers-k8s/community. /close

ack-bot commented 2 years ago

@ack-bot: Closing this issue.

In response to [this](https://github.com/aws-controllers-k8s/community/issues/1168#issuecomment-1186016020): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Provide feedback via https://github.com/aws-controllers-k8s/community. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
rahtr commented 1 year ago

do understand, though, that adoption will probably be used imperatively ie. a customer will adopt a resource manually, once, after creating the cluster. So maybe there is value if we view it like that?

Is there any advantage of keeping the AdoptedResource object? Cleaning it automatically looks like a good solution. Right now we are migration 1000s of resourcing taking benefit of the AdoptedResource, however we have to build a solution to delete the AdoptedResource CR once the ACK.Adopted = true which adds a lot of overhead.

/reopen

a-hilaly commented 1 year ago

Is there any advantage of keeping the AdoptedResource object?

@rahtr Not that I can think of :). IMO, if it doesn't impact your workload then there is no need to keep it in the cluster... maybe only for historical purposes?

rahtr commented 1 year ago

@A-Hilaly exactly. To avoid putting pressure on the API server, I believe the best is to give an ability to the user to delete the AdoptedResource CR automatically once the resource has been adopted successfully. Maybe a flag like ‘deleteOnSuccess’ in the CR itself.

ack-bot commented 1 year ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

ack-bot commented 1 year ago

Stale issues rot after 60d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 60d of inactivity. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle rotten

ack-bot commented 12 months ago

Rotten issues close after 60d of inactivity. Reopen the issue with /reopen. Provide feedback via https://github.com/aws-controllers-k8s/community. /close

ack-prow[bot] commented 12 months ago

@ack-bot: Closing this issue.

In response to [this](https://github.com/aws-controllers-k8s/community/issues/1168#issuecomment-1780191898): >Rotten issues close after 60d of inactivity. >Reopen the issue with `/reopen`. >Provide feedback via https://github.com/aws-controllers-k8s/community. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.