With this PR, panic is logged and reconciliation continues, without the pod crashing.
2024-08-22T12:20:22+03:00 DEBUG provider-aws Async create ended. {"trackerUID": "b1a58d9b-2236-4740-9a90-c169b5d2e538", "resourceName": "test-async-panic-handler-securitygroupingressrule", "gvk": "ec2.aws.upbound.io/v1beta1, Kind=SecurityGroupIngressRule", "error": "async create failed: panic: runtime error: index out of range [0] with length 0 [recovered]"}
There is one behavioral difference compared to the sync external client panic handling: Upon panicking during resource creation, async external client continues reconciling the resource and panicking each time, whereas sync external client stops reconciliation because of the existence of external-create-failed annotation on the resource. The async behavior is different, because Crossplane runtime is not async aware. Async clients let Crossplane runtime set external-create-succeeded annotation in order to be able to do the creation asynchronously. Having external-create-succeeded annotation prevents reconciliation from stopping even in the existence of external-create-failed annotation.
[x] Run make reviewable to ensure this PR is ready for review.
[ ] ~Added backport release-x.y labels to auto-backport this PR if necessary.~
How has this code been tested
Thanks to @haarchri's discovery, I triggered a panic using the manifest below. SecurityGroupIngressRule is a Terraform Plugin Framework resource, whereas SecurityGroup and VPC are Plugin SDKv2 resources.
E0822 12:19:21.286186 84654 runtime.go:79] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0)
Description of your changes
This PR implements a panic recovery mechanism for async external clients. Controller runtime can be configured to recover from panics, which Crossplane runtime adopted in https://github.com/crossplane/crossplane-runtime/pull/493. Unfortunately, that recovery mechanism doesn't apply to the Goroutines started by async external clients (an example). Therefore, panics in async code would crash pods as reported in https://github.com/crossplane-contrib/provider-upjet-aws/issues/1242.
With this PR, panic is logged and reconciliation continues, without the pod crashing.
There is one behavioral difference compared to the sync external client panic handling: Upon panicking during resource creation, async external client continues reconciling the resource and panicking each time, whereas sync external client stops reconciliation because of the existence of
external-create-failed
annotation on the resource. The async behavior is different, because Crossplane runtime is not async aware. Async clients let Crossplane runtime setexternal-create-succeeded
annotation in order to be able to do the creation asynchronously. Havingexternal-create-succeeded
annotation prevents reconciliation from stopping even in the existence ofexternal-create-failed
annotation.I have:
make reviewable
to ensure this PR is ready for review.backport release-x.y
labels to auto-backport this PR if necessary.~How has this code been tested
Thanks to @haarchri's discovery, I triggered a panic using the manifest below.
SecurityGroupIngressRule
is a Terraform Plugin Framework resource, whereasSecurityGroup
andVPC
are Plugin SDKv2 resources.Terraform provider panics with message: