F5Networks / k8s-bigip-ctlr

Repository for F5 Container Ingress Services for Kubernetes & OpenShift.
Apache License 2.0
364 stars 195 forks source link

Multi-cluster TS fails to create when 2 of them point to the same service #3538

Open avinashchundu9 opened 2 months ago

avinashchundu9 commented 2 months ago

Setup Details

CIS Version : 2.17.1
Build: f5networks/k8s-bigip-ctlr:latest
BIGIP Version: Big IP v16.1.3.1
AS3 Version: 3.46v1.28.10
Agent Mode: AS3
Orchestration: K8S
Orchestration Version: v1.28.10
Pool Mode: Nodeport

Description

Multi-cluster TS fails to create when 2 of them point to the same service

Steps To Reproduce

1) Create deployment and service 2) Create a 1 transport server and point it to create that service 3) Create another transport server and point it to the same service.

Expected Result

Both Transport servers should be working

Actual Result

2nd transport server not working.

trinaths commented 2 months ago

Created [CONTCNTR-4852] for internal tracking.

arzzon commented 2 months ago

Hi @avinashchundu9, To ensure that I'm on the same page, could you please confirm if my understanding of your current configuration is correct? Setup details: CIS Version : 2.17.1 Pool Mode: Nodeport CIS is running in multiCluster environment using one of these modes(active-active/active-standby/ratio)

Assuming that you are facing problem while creating Transport Server CRs as follows:

TS 1

apiVersion: cis.f5.com/v1
kind: TransportServer
metadata:
  labels:
    f5cr: "true"
  name: ts1
  namespace: ns1
spec:
  mode: standard
  pool:
    service: svc1
    servicePort: 1344
  snat: auto
  type: tcp
  virtualServerAddress: 10.1.1.1
  virtualServerPort: 1344

TS 2

apiVersion: cis.f5.com/v1
kind: TransportServer
metadata:
  labels:
    f5cr: "true"
  name: ts2
  namespace: ns1
spec:
  mode: standard
  pool:
    service: svc1
    servicePort: 1344
  snat: auto
  type: tcp
  virtualServerAddress: 10.2.2.2
  virtualServerPort: 1344

If there are any differences or additional details, please let me know. This will help me replicate the issue and work towards a solution.

avinashchundu9 commented 2 months ago

My pool mode on CIS is auto but my service is of type nodeport. So CIS is creating a pool with nodeports. Here are my sample YAML files

apiVersion: "cis.f5.com/v1"
kind: TransportServer
metadata:
  labels:
    f5cr: "true"
  name: f5-hello-world-ts
  namespace: achundu
spec:
  mode: standard
  virtualServerAddress: "IPaddress1"
  virtualServerPort: 8080
  virtualServerName: f5-hello-world-ts
  pool:
    service: f5-hello-world-ts-sc
    servicePort: 8080
    loadBalancingMethod: fastest-node
    monitor:
      type: tcp
      interval: 10
      timeout: 10

apiVersion: "cis.f5.com/v1"
kind: TransportServer
metadata:
  labels:
    f5cr: "true"
  name: f5-hello-world-ts-1
  namespace: achundu
spec:
  mode: standard
  virtualServerAddress: "IPaddress2"
  virtualServerPort: 8080
  virtualServerName: f5-hello-world-ts-1
  pool:
    service: f5-hello-world-ts-sc
    servicePort: 8080
    loadBalancingMethod: fastest-node
    monitor:
      type: tcp
      interval: 10
      timeout: 10
arzzon commented 2 months ago

Thanks @avinashchundu9 for sharing the details. However, it appears that both of the TransportServer CR YAMLs you shared are identical. Could you please update them?

avinashchundu9 commented 2 months ago

Thank you for pointing out. I updated YAML's.

nansenat16 commented 1 month ago

I get same issue when I create two virtual server on same service.

CIS Version : 2.18.0 Build: f5networks/k8s-bigip-ctlr:2.18.0 BIGIP Version: Big IP 17.1.1.3-0.0.5 AS3 Version: 3.52.0-5 Agent Mode: AS3 Orchestration: K8S Orchestration Version: k3s-1.30.4 Pool Mode: Cluster Additional Setup details: vxlan flannel

ctrl pod error log

2024/09/26 07:22:39 [DEBUG] [2024-09-26 07:22:39,981 icontrol.session DEBUG] RESPONSE::STATUS: 503 Content-Type: application/json;charset=utf-8 Content-Encoding: None Text: '{"code":503,"message":"There is an active asynchronous task executing.","errorStack":[],"apiError":32964609}'
2024/09/26 07:22:39 [ERROR] [2024-09-26 07:22:39,981 __main__ ERROR] Unexpected error: 503 Unexpected Error: Service Unavailable for uri: https://10.9.55.32:443/mgmt/tm/net/arp/?$filter=partition+eq+Common Text: '{"code":503,"message":"There is an active asynchronous task executing.","errorStack":[],"apiError":32964609}'
2024/09/26 07:22:39 [DEBUG] Traceback (most recent call last):
2024/09/26 07:22:39 [DEBUG]   File "/app/src/f5-ctlr-agent/f5_ctlr_agent/bigipconfigdriver.py", line 371, in _do_reset
2024/09/26 07:22:39 [DEBUG]     incomplete = self._update_cccl(config)
2024/09/26 07:22:39 [DEBUG]   File "/app/src/f5-ctlr-agent/f5_ctlr_agent/bigipconfigdriver.py", line 500, in _update_cccl
2024/09/26 07:22:39 [DEBUG]     incomplete += mgr._apply_net_config(cfg_net)
2024/09/26 07:22:39 [DEBUG]   File "/app/src/f5-ctlr-agent/f5_ctlr_agent/bigipconfigdriver.py", line 136, in _apply_net_config
2024/09/26 07:22:39 [DEBUG]     return self._cccl.apply_net_config(config)
2024/09/26 07:22:39 [DEBUG]   File "/app/src/f5-cccl/f5_cccl/api.py", line 102, in apply_net_config
2024/09/26 07:22:39 [DEBUG]     return self._service_manager.apply_net_config(services)
2024/09/26 07:22:39 [DEBUG]   File "/app/src/f5-cccl/f5_cccl/service/manager.py", line 721, in apply_net_config
2024/09/26 07:22:39 [DEBUG]     retval = self._service_deployer.deploy_net(desired_config)
2024/09/26 07:22:39 [DEBUG]   File "/app/src/f5-cccl/f5_cccl/service/manager.py", line 478, in deploy_net
2024/09/26 07:22:39 [DEBUG]     self._bigip.refresh_net()
2024/09/26 07:22:39 [DEBUG]   File "/app/src/f5-cccl/f5_cccl/bigip.py", line 148, in refresh_net
2024/09/26 07:22:39 [DEBUG]     self._refresh_net()
2024/09/26 07:22:39 [DEBUG]   File "/app/src/f5-cccl/f5_cccl/bigip.py", line 397, in _refresh_net
2024/09/26 07:22:39 [DEBUG]     arps = self._bigip.tm.net.arps.get_collection(
2024/09/26 07:22:39 [DEBUG]   File "/usr/local/lib/python3.9/site-packages/f5/bigip/resource.py", line 800, in get_collection
2024/09/26 07:22:39 [DEBUG]     self.refresh(**kwargs)
2024/09/26 07:22:39 [DEBUG]   File "/usr/local/lib/python3.9/site-packages/f5/bigip/resource.py", line 651, in refresh
2024/09/26 07:22:39 [DEBUG]     self._refresh(**kwargs)
2024/09/26 07:22:39 [DEBUG]   File "/usr/local/lib/python3.9/site-packages/f5/bigip/resource.py", line 634, in _refresh
2024/09/26 07:22:39 [DEBUG]     response = refresh_session.get(uri, **requests_params)
2024/09/26 07:22:39 [DEBUG]   File "/app/src/f5-icontrol-rest/icontrol/session.py", line 295, in wrapper
2024/09/26 07:22:39 [DEBUG]     raise iControlUnexpectedHTTPError(error_message, response=response)
2024/09/26 07:22:39 [DEBUG] icontrol.exceptions.iControlUnexpectedHTTPError: 503 Unexpected Error: Service Unavailable for uri: https://10.9.55.32:443/mgmt/tm/net/arp/?$filter=partition+eq+Common Text: '{"code":503,"message":"There is an active asynchronous task executing.","errorStack":[],"apiError":32964609}'
2024/09/26 07:22:39 [DEBUG] [2024-09-26 07:22:39,981 __main__ DEBUG] loaded configuration file successfully
2024/09/26 07:22:39 [ERROR] [2024-09-26 07:22:39,981 __main__ ERROR] Error applying config, will try again in 1 seconds
2024/09/26 07:22:39 [DEBUG] [2024-09-26 07:22:39,981 __main__ DEBUG] updating tasks finished, took 7.44950532913208 seconds
2024/09/26 07:22:40 [INFO] [Retry][AS3] post resulted in FAILURE
2024/09/26 07:22:40 [ERROR] [Retry][AS3] Response from BIG-IP: code: 422 --- tenant:k3s --- message: declaration failed
2024/09/26 07:22:40 [DEBUG] [AS3] Posting failed tenants configuration in 30s seconds
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  selector:
    app: nginx
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: 80
  type: ClusterIP
---
apiVersion: "cis.f5.com/v1"
kind: VirtualServer
metadata:
  name: nginx-service-virtual-server
  labels:
    f5cr: "true"
spec:
  virtualServerAddress: "10.9.55.34"
  virtualServerName: nginx-service-virtual-server
  waf: /Common/nginx_test
  defaultPool:
    reference: service
    service: nginx-service
    serviceNamespace: default
    servicePort: 80
  pools:
  - path: /
    service: nginx-service
    servicePort: 80
    monitors:
    - type: http
      send: /
      interval: 5
      timeout: 10
  serviceAddress:
  - icmpEcho: "enable"
    arpEnabled: true
    routeAdvertisement: "all"
---
apiVersion: cis.f5.com/v1
kind: TLSProfile
metadata:
  name: nginx-tls
  labels:
    f5cr: "true"
spec:
  tls:
    termination: edge
    clientSSL: /Common/nginx-dev
    reference: bigip
---
apiVersion: "cis.f5.com/v1"
kind: VirtualServer
metadata:
  name: nginx-service-virtual-server-https
  labels:
    f5cr: "true"
spec:
  virtualServerAddress: "10.9.55.34"
  tlsProfileName: nginx-tls
  virtualServerName: nginx-service-virtual-server-https
  waf: /Common/nginx_test
  defaultPool:
    reference: service
    service: nginx-service
    serviceNamespace: default
    servicePort: 80
  pools:
  - path: /
    service: nginx-service
    servicePort: 80
    monitors:
    - name: /Common/http
      reference: bigip
      type: http
      send: /
      interval: 5
      timeout: 10
  serviceAddress:
  - icmpEcho: "enable"
    arpEnabled: true
    routeAdvertisement: "all"
vklohiya commented 1 month ago

@nansenat16 , Please update your health monitor s follows and try:

    monitors:
        - name: /Common/http
          reference: bigip
vklohiya commented 1 month ago

@nansenat16 , I see you have created two virtual server with same service and same path only difference is one is on secure and other insecure virtual server. I believe you are trying to configure the httpTraffic termination allow. You can achieve this using the single virtual server CR itself. as follows:

apiVersion: "cis.f5.com/v1"
kind: VirtualServer
metadata:
  name: nginx-service-virtual-server-https
  labels:
    f5cr: "true"
spec:
  virtualServerAddress: "10.9.55.34"
  tlsProfileName: nginx-tls
  httpTraffic: allow
  virtualServerName: nginx-service-virtual-server-https
  waf: /Common/nginx_test
  defaultPool:
    reference: service
    service: nginx-service
    serviceNamespace: default
    servicePort: 80
  pools:
  - path: /
    service: nginx-service
    servicePort: 80
    monitors:
    - name: /Common/http
      reference: bigip
  serviceAddress:
  - icmpEcho: "enable"
    arpEnabled: true
    routeAdvertisement: "all"