F5Networks / k8s-bigip-ctlr

Repository for F5 Container Ingress Services for Kubernetes & OpenShift.
Apache License 2.0
357 stars 195 forks source link

Multi-cluster: Services in blue-green deployments don´t get updated #3326

Closed alonsocamaro closed 5 months ago

alonsocamaro commented 7 months ago

Setup Details

CIS Version : 2.15.1, BuildInfo: azure-5488-12ccf0c8f7714ed8d5cf399a1c773c26b1643337 Build: f5networks/k8s-bigip-ctlr:latest
BIGIP Version: BIG-IP 17.1.0.1 Build 0.0.4 Point Release 1 AS3 Version: 3.45.0
Agent Mode: AS3
Orchestration: K8S Orchestration Version:
Pool Mode: NodePort
Additional Setup details: OpenShift 4.12 with OVNKubernetes but using NodePort

Description

I´m using the following an A/B setup to perform per-application migration where the main backend is in only in OCP1 and the alternate backend is only in OCP2, using the following manifest:

apiVersion: "cis.f5.com/v1"
kind: VirtualServer
metadata:
  name: route-a
  namespace: openshift-ingress
  labels:
    f5cr: "true"
spec:
  host: www.migration.com
  virtualServerAddress: "10.1.10.106"
  hostGroup: migration.com
  tlsProfileName: reencrypt-tls
  profileMultiplex: "/Common/oneconnect-32"
  pools:
  - path: /
    service: router-default-route-a-ocp1
    servicePort: 443
    weight: 50
    alternateBackends:
    - service: router-default-route-a-ocp2
      weight: 50
    monitor:
      type: https
      name: /Common/www.migration.com
      reference: bigip

When creating and deleting the service router-default-route-a-ocp2 (only in OCP2) I get an inconsistent behaviour in which CIS fails to trigger AS3 reconfiguration and post an updated declaration.

Note: I´m using the following demo to reproduce this issue https://github.com/f5devcentral/f5-bd-cis-demo/tree/main/crds/demo-mc-twotier-haproxy-noshards and the scripts attached

Note: the .sh scripts mentioned are attached here repro-scripts.tar.gz

Note: I can share a UDF environment with this repro

Steps To Reproduce -- Service creation case

1) Create L7 routes in both clusters:

routes-bigip]$ ./create-route-a-bigip.sh ocp1
routes-bigip]$ ./create-route-a-bigip.sh ocp2

2) Create the services for the alternate backend in OCP2 referenced in the L7 route:

    alternateBackends:
    - service: router-default-route-a-ocp2
      weight: 50

with the following command

routes-bigip]$ ./create-service-route-a-bigip.sh ocp2

The CIS logs in DEBUG mode show the following:

2024/03/08 15:02:49 [DEBUG] Enqueueing Service: &Service{ObjectMeta:{router-default-route-a-ocp2  openshift-ingress  27966a10-8c2e-444a-93aa-d40a6b742a56 44142840 0 2024-03-08 15:02:49 +0000 UTC <nil> <nil> map[] map[kubectl.kubernetes.io/last-applied-configuration:{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"router-default-route-a-ocp2","namespace":"openshift-ingress"},"spec":{"ports":[{"name":"http","port":80,"protocol":"TCP","targetPort":"http"},{"name":"https","port":443,"protocol":"TCP","targetPort":"https"}],"selector":{"ingresscontroller.operator.openshift.io/deployment-ingresscontroller":"default"},"type":"NodePort"}}
] [] []  [{kubectl-client-side-apply Update v1 2024-03-08 15:02:49 +0000 UTC FieldsV1 {"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{"f:externalTrafficPolicy":{},"f:internalTrafficPolicy":{},"f:ports":{".":{},"k:{\"port\":80,\"protocol\":\"TCP\"}":{".":{},"f:name":{},"f:port":{},"f:protocol":{},"f:targetPort":{}},"k:{\"port\":443,\"protocol\":\"TCP\"}":{".":{},"f:name":{},"f:port":{},"f:protocol":{},"f:targetPort":{}}},"f:selector":{},"f:sessionAffinity":{},"f:type":{}}}}]},Spec:ServiceSpec{Ports:[]ServicePort{ServicePort{Name:http,Protocol:TCP,Port:80,TargetPort:{1 0 http},NodePort:31888,AppProtocol:nil,},ServicePort{Name:https,Protocol:TCP,Port:443,TargetPort:{1 0 https},NodePort:31926,AppProtocol:nil,},},Selector:map[string]string{ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default,},ClusterIP:172.31.50.102,Type:NodePort,ExternalIPs:[],SessionAffinity:None,LoadBalancerIP:,LoadBalancerSourceRanges:[],ExternalName:,ExternalTrafficPolicy:Cluster,HealthCheckNodePort:0,PublishNotReadyAddresses:false,SessionAffinityConfig:nil,TopologyKeys:[],IPFamilyPolicy:*SingleStack,ClusterIPs:[172.31.50.102],IPFamilies:[IPv4],AllocateLoadBalancerNodePorts:nil,LoadBalancerClass:nil,InternalTrafficPolicy:*Cluster,},Status:ServiceStatus{LoadBalancer:LoadBalancerStatus{Ingress:[]LoadBalancerIngress{},},Conditions:[]Condition{},},} from cluster: ocp2
2024/03/08 15:02:49 [DEBUG] Processing Key: &{openshift-ingress Service router-default-route-a-ocp2 0xc00028d680 Create ocp2 false}
[cloud-user@ocp-provisioner routes-bigip]$ 2024/03/08 15:02:51 [DEBUG] [2024-03-08 15:02:51,890 __main__ DEBUG] config handler woken for reset
2024/03/08 15:02:51 [DEBUG] [2024-03-08 15:02:51,890 __main__ DEBUG] loaded configuration file successfully
2024/03/08 15:02:51 [DEBUG] [2024-03-08 15:02:51,890 __main__ DEBUG] NET Config: {}
2024/03/08 15:02:51 [DEBUG] [2024-03-08 15:02:51,891 __main__ DEBUG] loaded configuration file successfully
2024/03/08 15:02:51 [DEBUG] [2024-03-08 15:02:51,891 __main__ DEBUG] updating tasks finished, took 0.0009188652038574219 seconds

notice that there is no AS3 reconfiguraiton or post, hence the pools continue unpopulated as shown next (unexpected)

Screenshot 2024-03-08 at 15 53 17

3) Create the services for the main backend referenced in the L7 route:

routes-bigip]$ ./create-service-route-a-bigip.sh ocp2

The following is shown in the logs

2024/03/08 15:04:13 [DEBUG] [AS3] posting request to https://10.1.1.7/mgmt/shared/appsvcs/declare/mc-twotier
2024/03/08 15:04:13 [INFO] [Request: 1][AS3] posting request to https://10.1.1.7 for mc-twotier tenants
2024/03/08 15:04:21 [DEBUG] [2024-03-08 15:04:21,892 __main__ DEBUG] config handler woken for reset
2024/03/08 15:04:21 [DEBUG] [2024-03-08 15:04:21,892 __main__ DEBUG] loaded configuration file successfully
2024/03/08 15:04:21 [DEBUG] [2024-03-08 15:04:21,893 __main__ DEBUG] NET Config: {}
2024/03/08 15:04:21 [DEBUG] [2024-03-08 15:04:21,893 __main__ DEBUG] loaded configuration file successfully
2024/03/08 15:04:21 [DEBUG] [2024-03-08 15:04:21,893 __main__ DEBUG] updating tasks finished, took 0.0013589859008789062 seconds
2024/03/08 15:04:25 [INFO] [Request: 1][AS3] post resulted in SUCCESS
2024/03/08 15:04:25 [DEBUG] [AS3] Response from BIG-IP: code: 200 --- tenant:mc-twotier --- message: success
2024/03/08 15:04:25 [DEBUG] Updating VirtualServer Status with {10.1.10.106 Ok} for resource name:route-a , namespace: openshift-ingress

and both main and alternate pools get populated as shown next

Screenshot 2024-03-08 at 16 04 43

Steps To Reproduce -- Service deletion case

continue from the last reproduction and perform the following additional steps

1) Delete the services for the main backend in OCP1

routes-bigip]$ ./delete-service-route-a-bigip.sh ocp1

An AS3 post is done and the pools are updated appropriately as shown next

Screenshot 2024-03-08 at 16 08 58

where the services of the alternate backend in OCP2 are left as expect

2) Delete the services for the alternate backend in OCP2

routes-bigip]$ ./delete-service-route-a-bigip.sh ocp2

the following is shown in the logs

2024/03/08 15:10:48 [DEBUG] Enqueueing Service: &Service{ObjectMeta:{router-default-route-a-ocp2  openshift-ingress  27966a10-8c2e-444a-93aa-d40a6b742a56 44145811 0 2024-03-08 15:02:49 +0000 UTC <nil> <nil> map[] map[kubectl.kubernetes.io/last-applied-configuration:{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"router-default-route-a-ocp2","namespace":"openshift-ingress"},"spec":{"ports":[{"name":"http","port":80,"protocol":"TCP","targetPort":"http"},{"name":"https","port":443,"protocol":"TCP","targetPort":"https"}],"selector":{"ingresscontroller.operator.openshift.io/deployment-ingresscontroller":"default"},"type":"NodePort"}}
] [] []  [{kubectl-client-side-apply Update v1 2024-03-08 15:02:49 +0000 UTC FieldsV1 {"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{"f:externalTrafficPolicy":{},"f:internalTrafficPolicy":{},"f:ports":{".":{},"k:{\"port\":80,\"protocol\":\"TCP\"}":{".":{},"f:name":{},"f:port":{},"f:protocol":{},"f:targetPort":{}},"k:{\"port\":443,\"protocol\":\"TCP\"}":{".":{},"f:name":{},"f:port":{},"f:protocol":{},"f:targetPort":{}}},"f:selector":{},"f:sessionAffinity":{},"f:type":{}}}}]},Spec:ServiceSpec{Ports:[]ServicePort{ServicePort{Name:http,Protocol:TCP,Port:80,TargetPort:{1 0 http},NodePort:31888,AppProtocol:nil,},ServicePort{Name:https,Protocol:TCP,Port:443,TargetPort:{1 0 https},NodePort:31926,AppProtocol:nil,},},Selector:map[string]string{ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default,},ClusterIP:172.31.50.102,Type:NodePort,ExternalIPs:[],SessionAffinity:None,LoadBalancerIP:,LoadBalancerSourceRanges:[],ExternalName:,ExternalTrafficPolicy:Cluster,HealthCheckNodePort:0,PublishNotReadyAddresses:false,SessionAffinityConfig:nil,TopologyKeys:[],IPFamilyPolicy:*SingleStack,ClusterIPs:[172.31.50.102],IPFamilies:[IPv4],AllocateLoadBalancerNodePorts:nil,LoadBalancerClass:nil,InternalTrafficPolicy:*Cluster,},Status:ServiceStatus{LoadBalancer:LoadBalancerStatus{Ingress:[]LoadBalancerIngress{},},Conditions:[]Condition{},},} from cluster: ocp2
2024/03/08 15:10:48 [DEBUG] Processing Key: &{openshift-ingress Service router-default-route-a-ocp2 0xc0004e5180 Delete ocp2 false}

Note that no AS3 reconfiguration/posts happens and hence no update to the pool occurs (unexpected)

Screenshot 2024-03-08 at 16 12 11

Expected Result

In the Service creation case, in step 2, the pool of the alternate backend should have been populated In the Service deletion case, in step 3, the pool of the alternate backend should have been deleted

Actual Result

The services for the alternate backend in OCP2 are not updated if not following a specific sequence, hence not having a declarative semantic.

trinaths commented 7 months ago

Created [CONTCNTR-4637] for internal tracking.

alonsocamaro commented 6 months ago

Confirmed that the expected behaviour is the behaviour with Routes and HA-proxy

trinaths commented 5 months ago

Fixed in 2.16.1