SovereignCloudStack / k8s-cluster-api-provider

Automation to use the OpenStack Kubernetes API Provider on SCS
https://scs.community/
Other
20 stars 11 forks source link

endless port+floating-ip allocation (LB for kube-api) #179

Closed garloff closed 2 years ago

garloff commented 2 years ago

The loadbalancer in front of the kubernetes API server(s) does get a floating IP address assigned. Occasionally, k8s does not seem to consider this assignment successful. When that happens, it allocates a new FIP, tries to assign it to the LB VIP port and fails (as we already have one FIP from the same external net assigned to that port). It does repeat that exercise, without cleaning up the allocated new FIP, so this goes until we run against the FIP quota limit. I see at least two bugs here:

  1. The first FIP assignment actually has been successful -- it should not be considered failed
  2. The retry loop should clean up created and failed FIPs again ... (and maybe notice here finally that the first FIP assignment exists)

It's unclear to me at this point whether the issue is with capo or occm.

garloff commented 2 years ago
ubuntu@capi2-mgmtcluster:~/cluster-defaults [0]$ openstack loadbalancer list
+--------------------------------------+----------------------------------------------------+----------------------------------+-------------+---------------------+----------+
| id                                   | name                                               | project_id                       | vip_address | provisioning_status | provider |
+--------------------------------------+----------------------------------------------------+----------------------------------+-------------+---------------------+----------+
| 596bb3f3-5284-4eec-bca0-adf476c62e0f | k8s-clusterapi-cluster-default-testcluster-kubeapi | b19cdc2339aa4d0e81b72e0f388ca4eb | 10.8.1.250  | ACTIVE              | amphora  |
+--------------------------------------+----------------------------------------------------+----------------------------------+-------------+---------------------+----------+
ubuntu@capi2-mgmtcluster:~/cluster-defaults [0]$ openstack floating ip list | grep -v None
+--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+----------------------------------+
| ID                                   | Floating IP Address | Fixed IP Address | Port                                 | Floating Network                     | Project                          |
+--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+----------------------------------+
| 240eb91a-44b1-492d-a0ba-65909c604b29 | 213.131.230.222     | 10.8.1.250       | f524aadd-37be-4b89-9b91-969a1d999c94 | a882b33a-f52e-4e0e-872d-140606e16930 | b19cdc2339aa4d0e81b72e0f388ca4eb |
| e7c311ec-468f-40d2-8ab3-45529e14b298 | 213.131.230.230     | 10.0.0.193       | 28957b1c-48ee-4a11-91e6-d37d60a267b3 | a882b33a-f52e-4e0e-872d-140606e16930 | b19cdc2339aa4d0e81b72e0f388ca4eb |
+--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+----------------------------------+

(there are 18 more FIPs not assigned to any port) And some capo log:

I0317 12:45:14.464915       1 openstackmachine_controller.go:275] controller/openstackmachine "msg"="Cluster infrastructure is not ready yet, requeuing machine" "cluster"="testcluster" "machine"="testcluster-md-0-genw1-76cc7c78d7-rxq69" "name"="k8s-clusterapi-testcluster-md-0-genw1-c86kp" "namespace"="default" "openStackCluster"="testcluster" "openStackMachine"="k8s-clusterapi-testcluster-md-0-genw1-c86kp" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="OpenStackMachine"
I0317 12:45:15.651149       1 floatingip.go:122] controller/openstackcluster "msg"="Associating floating IP" "cluster"="testcluster" "name"="testcluster" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="OpenStackCluster" "id"="8ceab080-f78f-4525-8809-879f3b0861f4" "ip"="213.131.230.143"
I0317 12:45:15.652194       1 recorder.go:103] events "msg"="Normal"  "message"="Created floating IP 213.131.230.143 with id 8ceab080-f78f-4525-8809-879f3b0861f4" "object"={"kind":"OpenStackCluster","namespace":"default","name":"testcluster","uid":"81ba607e-6bbc-4251-b619-e0a4635d7a0d","apiVersion":"infrastructure.cluster.x-k8s.io/v1alpha4","resourceVersion":"1623"} "reason"="SuccessfulCreateFloatingIP"
I0317 12:45:15.895972       1 recorder.go:103] events "msg"="Warning"  "message"="Failed to associate floating IP 213.131.230.143 with port f524aadd-37be-4b89-9b91-969a1d999c94: Expected HTTP response code [200] when accessing [PUT https://api.gx-scs.sovereignit.cloud:9696/v2.0/floatingips/8ceab080-f78f-4525-8809-879f3b0861f4], but got 409 instead\n{\"NeutronError\": {\"type\": \"FloatingIPPortAlreadyAssociated\", \"message\": \"Cannot associate floating IP 213.131.230.143 (8ceab080-f78f-4525-8809-879f3b0861f4) with port f524aadd-37be-4b89-9b91-969a1d999c94 using fixed IP 10.8.1.250, as that fixed IP already has a floating IP on external network a882b33a-f52e-4e0e-872d-140606e16930.\", \"detail\": \"\"}}" "object"={"kind":"OpenStackCluster","namespace":"default","name":"testcluster","uid":"81ba607e-6bbc-4251-b619-e0a4635d7a0d","apiVersion":"infrastructure.cluster.x-k8s.io/v1alpha4","resourceVersion":"1623"} "reason"="FailedAssociateFloatingIP"
E0317 12:45:15.918426       1 controller.go:317] controller/openstackcluster "msg"="Reconciler error" "error"="failed to reconcile load balancer: Expected HTTP response code [200] when accessing [PUT https://api.gx-scs.sovereignit.cloud:9696/v2.0/floatingips/8ceab080-f78f-4525-8809-879f3b0861f4], but got 409 instead\n{\"NeutronError\": {\"type\": \"FloatingIPPortAlreadyAssociated\", \"message\": \"Cannot associate floating IP 213.131.230.143 (8ceab080-f78f-4525-8809-879f3b0861f4) with port f524aadd-37be-4b89-9b91-969a1d999c94 using fixed IP 10.8.1.250, as that fixed IP already has a floating IP on external network a882b33a-f52e-4e0e-872d-140606e16930.\", \"detail\": \"\"}}" "name"="testcluster" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="OpenStackCluster"
I0317 12:45:15.920093       1 openstackcluster_controller.go:239] controller/openstackcluster "msg"="Reconciling Cluster" "cluster"="testcluster" "name"="testcluster" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="OpenStackCluster"
I0317 12:45:16.398215       1 openstackcluster_controller.go:374] controller/openstackcluster "msg"="Reconciling network components" "cluster"="testcluster" "name"="testcluster" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="OpenStackCluster"
I0317 12:45:16.931768       1 network.go:85] controller/openstackcluster "msg"="External network found" "cluster"="testcluster" "name"="testcluster" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="OpenStackCluster" "network id"="a882b33a-f52e-4e0e-872d-140606e16930"
I0317 12:45:16.931829       1 network.go:93] controller/openstackcluster "msg"="Reconciling network" "cluster"="testcluster" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="OpenStackCluster" "name"="k8s-clusterapi-cluster-default-testcluster"
I0317 12:45:17.046099       1 network.go:177] controller/openstackcluster "msg"="Reconciling subnet" "cluster"="testcluster" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="OpenStackCluster" "name"="k8s-clusterapi-cluster-default-testcluster"
I0317 12:45:17.121505       1 router.go:48] controller/openstackcluster "msg"="Reconciling router" "cluster"="testcluster" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="OpenStackCluster" "name"="k8s-clusterapi-cluster-default-testcluster"
I0317 12:45:17.425516       1 securitygroups.go:60] controller/openstackcluster "msg"="Reconciling security groups" "name"="testcluster" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="OpenStackCluster" "cluster"="default-testcluster"
I0317 12:45:17.837891       1 loadbalancer.go:45] controller/openstackcluster "msg"="Reconciling load balancer" "cluster"="testcluster" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="OpenStackCluster" "name"="k8s-clusterapi-cluster-default-testcluster-kubeapi"
I0317 12:45:19.308126       1 floatingip.go:122] controller/openstackcluster "msg"="Associating floating IP" "cluster"="testcluster" "name"="testcluster" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="OpenStackCluster" "id"="32009eb6-e749-4b87-9136-cb43c346b1a0" "ip"="213.131.230.121"
I0317 12:45:19.314308       1 recorder.go:103] events "msg"="Normal"  "message"="Created floating IP 213.131.230.121 with id 32009eb6-e749-4b87-9136-cb43c346b1a0" "object"={"kind":"OpenStackCluster","namespace":"default","name":"testcluster","uid":"81ba607e-6bbc-4251-b619-e0a4635d7a0d","apiVersion":"infrastructure.cluster.x-k8s.io/v1alpha4","resourceVersion":"1623"} "reason"="SuccessfulCreateFloatingIP"
I0317 12:45:19.548112       1 recorder.go:103] events "msg"="Warning"  "message"="Failed to associate floating IP 213.131.230.121 with port f524aadd-37be-4b89-9b91-969a1d999c94: Expected HTTP response code [200] when accessing [PUT https://api.gx-scs.sovereignit.cloud:9696/v2.0/floatingips/32009eb6-e749-4b87-9136-cb43c346b1a0], but got 409 instead\n{\"NeutronError\": {\"type\": \"FloatingIPPortAlreadyAssociated\", \"message\": \"Cannot associate floating IP 213.131.230.121 (32009eb6-e749-4b87-9136-cb43c346b1a0) with port f524aadd-37be-4b89-9b91-969a1d999c94 using fixed IP 10.8.1.250, as that fixed IP already has a floating IP on external network a882b33a-f52e-4e0e-872d-140606e16930.\", \"detail\": \"\"}}" "object"={"kind":"OpenStackCluster","namespace":"default","name":"testcluster","uid":"81ba607e-6bbc-4251-b619-e0a4635d7a0d","apiVersion":"infrastructure.cluster.x-k8s.io/v1alpha4","resourceVersion":"1623"} "reason"="FailedAssociateFloatingIP"
[...]
garloff commented 2 years ago

Probably related to https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1164 which is fixed in cluster api provider openstack 0.5.3

garloff commented 2 years ago

This issue no longer seems to occur with capo-0.5.3.

mxmxchere commented 1 year ago

I notice the same/very similar behaviour when i patch the testcluster-cloud-config secret too late. If i patch the secret immediately after creation the cluster creates fine/including one loadbalancer with one associated floating-IP. I run capo-controller-manager version 0.7.3