cncf / demo

Demo of CNCF technologies
https://cncf.io
Apache License 2.0
77 stars 39 forks source link

Kubernetes AWS problems with multiple security groups due to tags #144

Closed namliz closed 7 years ago

namliz commented 7 years ago

https://github.com/kubernetes/kubernetes/issues/23339, https://github.com/kubernetes/kubernetes/issues/26787

The Kubernetes Controller manages AWS resources by filtering on aws resource tags like KubernetesCluster:ClusterName. Unfortunately it does this inconsistently for different things.

8527    2292 log_handler.go:33] AWS request: elasticloadbalancing DescribeLoadBalancers
3961    2292 aws_loadbalancer.go:191] Deleting removed load balancer listeners
4035    2292 log_handler.go:33] AWS request: elasticloadbalancing DeleteLoadBalancerListeners
1501    2292 aws_loadbalancer.go:203] Creating added load balancer listeners
1592    2292 log_handler.go:33] AWS request: elasticloadbalancing CreateLoadBalancerListeners
3129    2292 log_handler.go:33] AWS request: elasticloadbalancing DescribeLoadBalancerAttributes
3214    2292 log_handler.go:33] AWS request: elasticloadbalancing ModifyLoadBalancerAttributes
4591    2292 log_handler.go:33] AWS request: elasticloadbalancing DescribeLoadBalancers
9882    2292 log_handler.go:33] AWS request: ec2 DescribeSecurityGroups
1322    2292 log_handler.go:33] AWS request: ec2 DescribeSecurityGroups
8421    2292 aws.go:2731] Error opening ingress rules for the load balancer to the instances: Multiple tagged security groups found for instance i-04bd9c4c8aa; ensure only the k8s security group is tagged
8469    2292 servicecontroller.go:754] Failed to process service. Retrying in 5m0s: Failed to create load balancer for service default/pushgateway: Mutiple tagged security groups found for instance i-04bd9c4c8aa36270e; ensure only the k8s security group is tagged
8480    2292 servicecontroller.go:724] Finished syncing service "default/pushgateway" (419.263237ms)
lines 201-224

https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L2783

// Returns the first security group for an instance, or nil
// We only create instances with one security group, so we don't expect multiple security groups.
// However, if there are multiple security groups, we will choose the one tagged with our cluster filter.
// Otherwise we will return an error.

The security groups in my case are:

k8s-minions-cncfdemo, k8s-masters-cncfdemo

They are both tagged with the cluster filter. Not expecting multiple security groups seems like a wrong (not to mentioned undocumented!) assumption.

Bit of a head scratcher.

namliz commented 7 years ago

Untagging k8s-masters-cncfdemo triggers the following events:

Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.741311    2292 aws.go:2928] Adding rule for traffic from the load balancer (sg-4c8ee935) to instances (sg-fcbdd985)
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.741361    2292 log_handler.go:33] AWS request: ec2 DescribeSecurityGroups
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.793955    2292 aws.go:2002] Existing security group ingress: sg-fcbdd985 [{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: IpProtocol: "-1",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserIdGroupPairs: [{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-d28becab",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserId: "750548967590"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: },{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-fabdd983",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserId: "750548967590"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: },{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-fcbdd985",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserId: "750548967590"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }]
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: } {
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: FromPort: 22,
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: IpProtocol: "tcp",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: IpRanges: [{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: CidrIp: "0.0.0.0/0"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }],
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: ToPort: 22
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }]
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.794013    2292 aws.go:1874] Comparing {
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: IpProtocol: "-1",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserIdGroupPairs: [{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-4c8ee935"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }]
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: } to {
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: IpProtocol: "-1",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserIdGroupPairs: [{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-d28becab",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserId: "750548967590"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: },{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-fabdd983",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserId: "750548967590"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: },{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-fcbdd985",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserId: "750548967590"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }]
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.794051    2292 aws.go:1904] Comparing sg-4c8ee935 to sg-d28becab
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.794057    2292 aws.go:1904] Comparing sg-4c8ee935 to sg-fabdd983
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.794061    2292 aws.go:1904] Comparing sg-4c8ee935 to sg-fcbdd985
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.794067    2292 aws.go:2030] Adding security group ingress: sg-fcbdd985 [{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: IpProtocol: "-1",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserIdGroupPairs: [{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-4c8ee935"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }]
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }]
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.794147    2292 log_handler.go:33] AWS request: ec2 AuthorizeSecurityGroupIngress
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.897583    2292 aws.go:3146] Returning cached instances for map[ip-172-20-0-127.us-west-2.compute.internal:{} ip-172-20-0-231.us-west-2.compute.internal:{} ip-172-20-0-232.us-west-2.compute.internal:{}]
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.897657    2292 log_handler.go:33] AWS request: elasticloadbalancing DescribeLoadBalancers
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.929252    2292 log_handler.go:33] AWS request: ec2 DescribeSecurityGroups
Oct 25 09:26:56 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:56.100143    2292 log_handler.go:33] AWS request: ec2 DescribeSecurityGroups
Oct 25 09:26:56 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:56.374814    2292 reflector.go:284] pkg/controller/endpoint/endpoints_controller.go:157: forcing resync
namliz commented 7 years ago

This solves the problem (!) -- ELB's picked up the instances because the tag filtering didn't get confused and there's no external routes added to some services I deployed.

This should really be documented somewhere!

The other problem would be whether or not a cluster would standup cleanly because some other tag filtering code might do things slightly differently and actually need the k8s-masters-cncfdemo security group tagged.

Finally, I strongly think this is incorrect behaviour.