aws-controllers-k8s / community

AWS Controllers for Kubernetes (ACK) is a project enabling you to manage AWS services from Kubernetes
https://aws-controllers-k8s.github.io/community/
Apache License 2.0
2.39k stars 253 forks source link

EC2 SecurityGroup v0.18 fails when ingressRules are defined using `userIDGroupPairs` and `groupID` #1443

Closed apmclean closed 1 year ago

apmclean commented 2 years ago

Describe the bug Using release .18 of EC2 SecurityGroup where an IngressRule is defined to use an existing Security Group via groupID - the SecurityGroup creation is continuously retried even though it exists. No IngressRules are present.

Steps to reproduce

1) Create a non-Default VPC, and note the VPC ID.
2) Create a Security Group within the VPC created in step 1. Note the Security Group ID.

Use the following definition with substitutions for vpcID and groupID:

apiVersion: ec2.services.k8s.aws/v1alpha1
kind: SecurityGroup
metadata:
  name: security-group-with-ingress-sg
spec:
  name: security-group-with-ingress
  description: SecurityGroup with an Ingress Rule on Non Default VPC
  ingressRules:
    - fromPort: 5432
      toPort: 5432
      ipProtocol: tcp
      userIDGroupPairs:
        - groupID: sg-[existing-id]
          description: Source Security Group
  vpcID: vpc-[existing-id]
  tags:
    - key: Name
      value: security-group-with-ingress

Creation will fail with message:

  Conditions:
    Message:               InvalidGroup.Duplicate: The security group 'security-group-with-ingress' already exists for VPC 'vpc-[...]'
                           status code: 400, request id: [...]
    Status:                True
    Type:                  ACK.Recoverable
    Last Transition Time:  2022-08-23T01:59:10Z
    Message:               Unable to determine if desired resource state matches latest observed state
    Reason:                InvalidGroup.Duplicate: The security group 'security-group-with-ingress' already exists for VPC 'vpc-[...]'
                           status code: 400, request id: [...]
    Status:                Unknown
    Type:                  ACK.ResourceSynced
Events:                    <none>

The security group has been created by ACK (Confirmed via CloudTrail, and via Tags on resource), however the ingress rules are not present. NOTE If ipRanges are used instead of userIDGroupPairs the groups and ingress rules will create successfully.

Expected outcome

The Security Group should be created permitting ingress from the noted Security Group ID. Similar to Example (Shared001) in the GoLang Documentation for AuthorizeSecurityGroupIngress.

Environment

brycahta commented 2 years ago

@apmclean I can't seem to reproduce the issue locally. I tried using default vpc for vpcID and default security group for userIDGroupPairs.groupID and saw the ingress rule created as expected. Per your instruction, I created a new vpc and security group and used those IDs, but again saw expected behavior.

Let me check with some members of the team to see if they can repro

apmclean commented 2 years ago

The only other nuance is my default VPC is deleted. Don't think that will have any bearing though. I also upgraded from .17. Let me tear it all down and re-try from fresh.

apmclean commented 2 years ago

Apologies, my reproduction steps were incomplete. When I encounter the issue I'm using groupName and not groupID.

Same steps as above, but instead use the groupName for the security group you created within that VPC.

ie:

  ingressRules:
    - fromPort: 5432
      toPort: 5432
      ipProtocol: tcp
      userIDGroupPairs:
        - groupName: [group-name-of-existing-security-group]
          description: Source Security Group
brycahta commented 2 years ago

From the UserIdGroupPair docs:

GroupName (request), groupName (response) The name of the security group. In a request, use this parameter for a security group in EC2-Classic or a default VPC only. For a security group in a nondefault VPC, use the security group ID. For a referenced security group in another VPC, this value is not returned if the referenced security group is deleted.

Is there a particular reason you're using groupName? Given the documentation above, it seems like you should use groupID:

  1. Your repro steps use a non-default VPC so groupID should be used
  2. EC2-Classic is retiring as of August 15, 2022
apmclean commented 2 years ago

Ah - I think I miss-understood the ACK API Documentation. I didn't resolve mentally that EC2 Classic would be even considered given its age. I had understood / expected that ACK would look the group name up for me as needed to fulfill the request. Working with a group-name is far more friendly than needing to look the ID up every request for automation.

ie (using CLI):

aws ec2 describe-security-groups --filters Name=group-name,Values=sg-name Name=vpc-id,Values=vpc-id

Which would resolve the Security Group ID for me.

I'd suggest:

Thanks!

brycahta commented 2 years ago

I had understood / expected that ACK would look the group name up for me as needed to fulfill the request. Working with a group-name is far more friendly than needing to look the ID up every request for automation.

Depending on the use case, I believe resource references could help bridge this gap. This feature is implemented for some ec2 resources already such as subnet. This allows the Subnet's vpcID field to be populated via a VPC resource that may not exist yet. An example of using this feature is in the resource test file subnet_ref.yaml. Here you will see it refers to vpc by Name and given the config in the first link, it will extract the vpcID from the vpc's status. Note, this Name is the name of the k8s custom resource.

Once resource references is implemented for SecurityGroup resource, then you should be able to give a SecurityGroup resource a deterministic Name and refer to that in other resources (note, this can be different from the group-name). However, I say 'depending on use case' because I don't think this will work for resources (or security groups) that are not managed by the controller. @jaypipes @A-Hilaly Feel free to chime in if I missed something.

Appreciate the suggestions!

Documentation to tag anything that only works for EC2-Classic in the API.

When first creating the controller, we made the decision to support EC2-VPC only given the deprecation date of EC2-Classic. I will ensure this is visible and clear in the controller docs.

Error handling in the controller.

Agreed.

jaypipes commented 1 year ago

@brycahta I agree that using resource references would be the best user experience here. Potentially also removing the groupName field entirely since it's only used for EC2 Classic...

ack-bot commented 1 year ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

ack-bot commented 1 year ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle rotten

ack-bot commented 1 year ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Provide feedback via https://github.com/aws-controllers-k8s/community. /close

ack-prow[bot] commented 1 year ago

@ack-bot: Closing this issue.

In response to [this](https://github.com/aws-controllers-k8s/community/issues/1443#issuecomment-1435117156): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Provide feedback via https://github.com/aws-controllers-k8s/community. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.