crossplane-contrib / provider-upjet-aws

Official AWS Provider for Crossplane by Upbound.
https://marketplace.upbound.io/providers/upbound/provider-aws
Apache License 2.0
137 stars 112 forks source link

[Bug]: RDS instances not syncing due to wrong AZ in spec #1379

Open bobdanek opened 6 days ago

bobdanek commented 6 days ago

Is there an existing issue for this?

Affected Resource(s)

Resource MRs required to reproduce the bug

exampledb.yml.txt mysqlinstanceandservice.yml.txt xmysqlinstance.yml.txt xmysqlinstanceandservice.yml.txt

Steps to Reproduce

I don't have a way to reproduce this outside our environment, but here's an approximation:

What happened?

Expected: RDS instance created successfully, and the instance managed resource stays synced. The AZ in which the instance was created matches the AZ that appears in the spec.

Actual behavior: RDS instance created successfully, but at some point Synced on the Instance becomes False because a different availability zone appeared in the spec, which causes replacement. Replacement is blocked (thankfully) due to "prevent_destroy":true. Any unrelated changes we want to make are blocked by this.

Relevant Error Output Snippet

conditions:
  - lastTransitionTime: "2024-06-25T16:41:47Z"
    message: 'observe failed: cannot run plan: plan failed: Instance cannot be destroyed:
      Resource aws_db_instance.example-12345-abcde has lifecycle.prevent_destroy
      set, but the plan calls for this resource to be destroyed. To avoid this error
      and continue with the plan, either disable lifecycle.prevent_destroy or reduce
      the scope of the plan using the -target flag.'
    reason: ReconcileError
    status: "False"
    type: Synced
  - lastTransitionTime: "2024-02-07T13:40:20Z"
    reason: Finished
    status: "True"
    type: AsyncOperation
  - lastTransitionTime: "2023-11-21T19:16:10Z"
    reason: Available
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-02-07T13:48:19Z"
    message: 'apply failed: Instance cannot be destroyed: Resource aws_db_instance.example-12345-abcde
      has lifecycle.prevent_destroy set, but the plan calls for this resource to be
      destroyed. To avoid this error and continue with the plan, either disable lifecycle.prevent_destroy
      or reduce the scope of the plan using the -target flag.'
    reason: ApplyFailure
    status: "False"
    type: LastAsyncOperation

Crossplane Version

1.14.9

Provider Version

0.40.102

Kubernetes Version

v1.28.9-eks-036c24b

Kubernetes Distribution

EKS

Additional Info

bobdanek commented 4 days ago

I found a workaround that will unblock me, though it doesn't explain why things got into this state to begin with:

If I delete availabilityZone from spec.forProvider, the resource gets updated a few seconds later, adding back availabilityZone but with the correct/expected value.

Example with kubectl:

kubectl patch instances.rds.aws.upbound.io example-db --type json -p '[{ "op": "remove", "path": "/spec/forProvider/availabilityZone" }]'