Open vibe opened 3 months ago
I wanted to call out this behavior also shows up for ManagedPrefixLists and Route. Causing reconciliation errors that are tedious to recover from, since they require manual intervention.
@vibe Thanks for the bug report. Anything that causes the provider pod to crash is concerning.
Could you provide a bit more detail? In particular, it would be useful to be able to see the output of kubectl get -o yaml
for the relevant resources at each step. You can replace any ip addresses, aws account ids, or anything else you deem sensitive if you feel that's necessary; we don't need that information for debugging.
I'm particularly interested to know what precisely you are doing with
Update policy to resolve always on ingress rule
and what you mean by
SecurityGroupId is also missing from the securitygroupingress rule spec after changing policy to "always". (suspecting this causes the crash)
I think I know what you mean, but a yaml manifest is the most clear way to explain that without any ambiguity.
Also, I'm curious to know what happens if you add "either wait for 10 minutes or make some edit to any annotation on the SecurityGroupIngressRule" to your STRs in between recreating the security group and setting the policy to required. I think that might trigger the Ref
field to resolve to the new SecurityGroup, although that may end up producing a different invalid state. It would be good to know what happens regardless.
Finally, please put your yaml manifests in ``` triple backticks, so that github will preserve the whitespace. Otherwise they become very difficult to read.
My first impression is that the panic is a bug in the terraform aws provider, but perhaps one that we can avoid triggering through better validation logic.
Is there an existing issue for this?
Affected Resource(s)
ec2.aws.upbound.io/v1beta1 - SecurityGroupIngressRule ec2.aws.upbound.io/v1beta1 - SecurityGroupEgressRule ec2.aws.upbound.io/v1beta1 - SecurityGroup
Resource MRs required to reproduce the bug
apiVersion: ec2.aws.upbound.io/v1beta1 kind: SecurityGroup metadata: labels: id: some-id region: us-west-2 name: sample-security-group spec: deletionPolicy: Delete forProvider: vpcIdSelector: matchControllerRef: true matchLabels: region: us-west-2 managementPolicies:
'*'
apiVersion: ec2.aws.upbound.io/v1beta1 kind: SecurityGroupIngressRule metadata: name: some-rule spec: deletionPolicy: Delete forProvider: fromPort: 8078 ipProtocol: tcp referencedSecurityGroupIdSelector: matchControllerRef: true matchLabels: region: us-west-2 region: us-east-1 securityGroupId: sg-0eb2d04c577f1db47 #this gets autofilled on cluster after first resolution but never updates if parent security group changes securityGroupIdRef: name: sample-security-group ## this gets autofilled on cluster after first resolution but never updates if parent security group changes securityGroupIdSelector: matchControllerRef: true matchLabels: id: some-id region: us-west-2 toPort: 8078 initProvider: {} managementPolicies:
Steps to Reproduce
Create Security Group Create SecurityGroupIngressRule
Recreate Security Group (so security id changes)
SecurityGroupIngressRule still points to old id
Update policy to
resolve
always
on ingress ruleEc2 pod crashes
no way out of this other than to delete security group ingress rules.
What happened?
Security Group Rules should get recreated.
SecurityGroupId is also missing from the securitygroupingress rule spec after changing policy to "always". (suspecting this causes the crash)
Relevant Error Output Snippet
Crossplane Version
1.15.1
Provider Version
1.2.1
Kubernetes Version
1.29
Kubernetes Distribution
EKS
Additional Info
No response