Open oliabent opened 2 weeks ago
I can reproduce the issue running uptest. It requires manually kubectl delete
ing the MultiRegionAccessPoint because it otherwise gets stuck due to #1363. We should probably handle both issues together, as they seem likely to have related solutions.
For each managed resource I delete, the provider pod restarts once, then successfully deletes the external resource, resolves the finalizer, and completes deletion of the managed resource. This makes me think that the reason for the panic is somehow related to the erroneous reconcile loop described in #1363, and that when the provider restarts with the managed resource already having a deletion timestamp (but not yet completely deleted), it doesn't trigger that reconcile loop, and instead successfully deletes the external resource.
Looking at the implementation of the lines of code mentioned in the stacktrace, I've figured out what's happening.
in
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKExternal).Delete(0xc00bf28540, {0x195b9bc0, 0xc000b28540}, {0xc00c4af560?, 0xc0097e6cc0?})
github.com/crossplane/upjet@v1.4.1/pkg/controller/external_tfpluginsdk.go:714 +0x146
upjet sets n.instanceDiff.Destroy = true
, then invokes
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).Apply(0xc0045795e0, {0x195b9bc0, 0xc000b28540}, 0xc0025852b0, 0xc00a512780, {0x167944c0, 0xc006528820})
github.com/hashicorp/terraform-plugin-sdk/v2@v2.33.0/helper/schema/resource.go:909 +0xa89
That Apply
method is designed to handle both a "delete only" request, and a delete and recreate request. The delete implementation contains this code
// If we're only destroying, and not creating, then return
// now since we're done!
if !d.RequiresNew() {
return nil, diags
}
which in this case doesn't trigger, because the observed instance diff has RequiresNew = true
due to #1363. So then the function proceeds to attempt to recreate the desired state, which is (presumably; I didn't verify this) invalid and causes a panic due to some unexpected input. This also explains why there's a call to Create inside a Delete method.
I think we should update upjet to explicitly set RequiresNew
to false when deleting sdk resources, as this could otherwise produce unexpected results even if it doesn't cause a panic.
Is there an existing issue for this?
Affected Resource(s)
xpkg.upbound.io/upbound/provider-aws-s3control:v1.6.0
Resource MRs required to reproduce the bug
apiVersion: pkg.crossplane.io/v1 kind: Provider metadata: name: provider-aws-s3control spec: package: xpkg.upbound.io/upbound/provider-aws-s3control:v1.6.0
apiVersion: s3control.aws.upbound.io/v1beta1 kind: MultiRegionAccessPoint metadata: name: {{ $multiRegionAccessPointName }} annotations: gotemplating.fn.crossplane.io/composition-resource-name: multiregionaccesspoint spec: forProvider: details:
Steps to Reproduce
request deletion of resource:
kubectl delete...
What happened?
pod crashes and restarts
Relevant Error Output Snippet
Crossplane Version
v1.14.3
Provider Version
v1.6.0
Kubernetes Version
v1.29.4
Kubernetes Distribution
EKS
Additional Info
No response