aws / amazon-vpc-resource-controller-k8s

Controller for managing Trunk & Branch Network Interfaces on EKS Cluster using Security Group For Pod feature and IPv4 Addresses for Windows Node.
Apache License 2.0
79 stars 53 forks source link

Support more than 5 SecurityGroups on one CRD #413

Open changhyuni opened 5 months ago

changhyuni commented 5 months ago

Issue #, if available: Modify the maxitems in SecurityGroups. According to "Amazon VPC Quotas," it should be able to use up to 16.

One CRD must support at least 10 security groups If I need more than 5 security groups, it's hard to manage because I need more than one crd. I don't know why one crd doesn't support more than 10 security groups

Description of changes: We need to use more than 10 security groups. Increase the maxitems to the maximum number of allowable (16), and modify the descritions and README. https://docs.aws.amazon.com/vpc/latest/userguide/amazon-vpc-limits.html#vpc-limits-security-groups

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

sushrk commented 5 months ago

The default number of security groups per Network Interface is 5. Updating the MaxItems=16 could lead to customers erroneously adding more than 5 SGs in the security group policy prior to increasing the service quota.

The recommended way is to define more than 1 security group if you need more than 5 SGs to be specified, as documented in the README.

Can you help me understand if any issues with multiple CRDs? We understand this is harder to manage than 1 CRD, but hopefully it is a one-time setup effort for SGP CRDs.

changhyuni commented 5 months ago

@sushrk Hi. Thank you for your comment. In our case, we need to apply 14 SecurityGroups, so we are managing 3 crds. We are splitting what could be managed in one crd into 3. This becomes increasingly complex as the system grows larger.

It seems like a reasonable approach would be to guide users to specify only as many SecurityGroups as they need what do you think? Is there any advantage to enforcing this through maxitems? If I apply it without increasing the qutos, won't it be rejected by the vpc qutos anyway?

Or, how about making it possible to dynamically modify maxitems? I can help you that

sushrk commented 5 months ago

If I apply it without increasing the qutos, won't it be rejected by the vpc qutos anyway?

I need to check on this.

Is there any advantage to enforcing this through maxitems?

Yeah, as mentioned before this avoids customers from specifying more than the default number(5) of security groups in the CRD.

how about making it possible to dynamically modify maxitems?

Hm, this might be possible if we set maxItems in CRD based on the service quota value. Let me see if it is feasible.

changhyuni commented 5 months ago

@sushrk How about using servicequotas to look up the vpc service qutos(securitygroup) value to set maxitems when vpc cni is deployed? https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/servicequotas Let me see if it is feasible too.

GnatorX commented 5 months ago

We would love this feature also. We have increased service quotas and currently rely on a fork of the VPC RC to get around this

changhyuni commented 5 months ago

@sushrk Hi! How's it going? I'm curious because I haven't seen any updates.

sushrk commented 5 months ago

Sorry for the late reply. Thinking about this, it makes sense to me to set the maxItems to absolute maximum of 16 in CRD definition and internally check that the number of SGs specified is lower than the service quota value at CRD creation.

We'd need to document this more clearly though, especially when customers want to lower their quota so it does not break existing workloads.

I'll need to confirm the customer experience with PM, will post an update tomorrow.

@changhyuni are you planning to add the support for this?

changhyuni commented 4 months ago

@sushrk Hi. Thank you for your feedback. I think I can create a webhook to check the number of security groups in the CRD. What do you think? I'v tried to make webhook and will add test code soon. Can I use the aws-go-sdk-v2? I looked at your other code and it doesn't seem to use it, it uses an api package.

sushrk commented 4 months ago

I think I can create a webhook to check the number of security groups in the CRD.

Can you explain why we are adding a webhook to check the number of SGs? Does this validate the CRD at creation/update/both? Also, we need to handle the case where more than one SGP CRD can be applied to a pod, probably will not be sufficient to validate individual CRD objects. We should add this validation at pod creation.

Can I use the aws-go-sdk-v2? I looked at your other code and it doesn't seem to use it, it uses an api package.

We use aws-go-sdk for EC2 operations, but through a wrapper and helper functions. We haven't yet migrated to v2.

changhyuni commented 4 months ago

@sushrk Hi I had some time and made a few additions. Created an API via ec2 wrapper and added logic to validate on pod webhooks

Also, we need to handle the case where more than one SGP CRD can be applied to a pod, probably will not be sufficient to validate individual CRD objects. We should add this validation at pod creation.

Right, I think we need to add it to the pod webhook or utils package

changhyuni commented 3 months ago

@sushrk Could you check this out when you have a time?

haouc commented 3 months ago

We need avoid adding upstream service support check in pod mutating webhook. The webhook need to be light and respond quickly. Always calling upstream API will be more and more expensive for most of users who don't have the need to support more than 5 security groups. Since users should be aware that their service quotas before applying the setting to the CRD, I think fail a pod creation and send a pod event as a message should be good enough.

changhyuni commented 3 weeks ago

@haouc Hello, How is the requirement coming along? I'd like to assist if possible.