F5Networks / k8s-bigip-ctlr

Repository for F5 Container Ingress Services for Kubernetes & OpenShift.
Apache License 2.0
357 stars 195 forks source link

CIS is deleting referenced SSL Profile it did not create #3414

Closed pmilot closed 4 months ago

pmilot commented 5 months ago

Setup Details

CIS Version : 2.16

Description

CIS is posting unnecessarily to /Common tenant trying to delete a referenced ssl profile. I don't know what it is trying to do but it should not be trying to delete a referenced profile that it did not create ?

Also when the VS is deleted, CIS deletes the VIP and the ssl profile from the /Common partition that was referenced in the TLSProfile

This is very similar to a bug I reported last year.
https://github.com/F5Networks/k8s-bigip-ctlr/issues/2797

Steps To Reproduce

1) Create TLSprofile with a referenced bigip profile ex: /Common/Shared/my_ssl_profile 2) Create a VS with the SSL profile 3) The CIS goes into an endless loop trying to post to the /Common tennant

CIS LOG declaration failed response:01070265:3: The ClientSSL Profile (/Common/Shared/my_clientssl_profile) cannot be deleted because it is in use by a virtual server profile (/atlas/Shared/crd_10_10_10_177_443 /Common/Shared/my_clientssl_profile)). runTime:3821 tenant:Common]]]

BIGIP LOG May 8 19:00:19 QA-K8S-BIGIP-01.mydomain.local err mcpd[7723]: 01070265:3: The ClientSSL Profile (/Common/Shared/my_clientssl_profile) cannot be deleted because it is in use by a virtual server profile (/atlas/Shared/crd_10_10_10_177_443 /Common/Shared/my_clientssl_profile).

4) Delete the VS and CIS will delete the VIP and the SSL profile that was referenced

2024/05/08 19:04:15 [DEBUG] [AS3] Response from BIG-IP: code: 200 --- tenant:Common --- message: success 2024/05/08 19:04:15 [DEBUG] [AS3] Response from BIG-IP: code: 200 --- tenant:Common --- message: success

Expected Result

CIS should not be deleting or modifying referenced TLS profiles

Actual Result

referenced TLS profiles are actually deleted by CIS even though they were never created with CIS

pmilot commented 5 months ago

Maybe I should have included the TLSProfile yaml

apiVersion: v1
    name: reencrypt-tls
    namespace: mynamespace
  spec:
    hosts:
    - myapp.example.com.
    tls:
      clientSSL: /Common/Shared/my_ssl_profile
      reference: bigip
      serverSSL: /Common/Shared/my_serverssl_profile
      termination: reencrypt
pmilot commented 4 months ago

This problem appears to be fixed in 2.16.1. Is it possible to confirm ?

trinaths commented 4 months ago

Created [CONTCNTR-4732] for internal tracking.

pmilot commented 4 months ago

@trinaths We discovered today as we are trying to work around this bug that this bug affects more than the ssl profiles.

CIS is deleting ssl profiles, irules, logging profiles that are referenced from /Common/Shared.

trinaths commented 4 months ago

@pmilot Please CIS configuration and logs to automation_toolchain_pm automation_toolchain_pm@f5.com

arzzon commented 4 months ago

Closing the issue as it's fixed with #3427

pmilot commented 4 months ago

@arzzon Any chance I can get a dev build with this fix ?

Thanks

arzzon commented 4 months ago

@pmilot Please use the following CIS build generated from the build pipeline: quay.io/f5networks/k8s-bigip-ctlr-devel:6c771a457631bc7d16aa844b49f095c47ffeee5c

pmilot commented 4 months ago

@arzzon We will test this today and report back. TY

arzzon commented 4 months ago

@pmilot We found out that the issue is caused due to Virtual Server CR misconfiguration(Virtual server CRs created with partition set as Common), for which we have improved the CR validations. In case any VS CR is having partition set as Common, CIS will log error like: "[ERROR] VirtualServer xyz cannot be created in Common partition" Such Virtual Server CRs need to be removed.

pmilot commented 4 months ago

@arzzon That is odd as we are not creating any VS in /Common. We are creating them in a dedicated partition for the cluster

apiVersion: cis.f5.com/v1
kind: VirtualServer
metadata:
  labels:
    f5cr: "true"
  name: dummy-user-portal
  namespace: dummy-user-portal
spec:
  virtualServerAddress: 10.1.yyy.xxx
  tlsProfileName: reencrypt-tls
  host: myapp.example.com
  iRules: 
  -  /Common/Shared/k8s_ingress_sni_irule
  policyName: dummy-user-portal-policy
  persistenceProfile: none
  pools:
    - path: /
      service: k8s-ingress
      servicePort: 443
      serviceNamespace: istio-system
      extendedServiceReferences:
        - clusterName: mycluster-1
          namespace: istio-system
          servicePort: 443
          service: k8s-ingress
      monitor:
          type: tcp
          interval: 10
          timeout: 31
    Command:
      /app/bin/k8s-bigip-ctlr
    Args:
      --credentials-directory
      /tmp/creds
      --bigip-partition=mycluster-1
      --bigip-url=https://10.1.yyy.xxx
      --custom-resource-mode=true
      --insecure=true
      --ipam=false
      --log-as3-response=true
      --log-level=DEBUG
      --namespace=dummy-user-portal
      --namespace=istio-system
      --pool-member-type=nodeport
arzzon commented 4 months ago

@pmilot Thanks for the confirmation. If this is the case then you won't see the error message mentioned above. The shared CIS build has fixes to avoid posting declarations to Common partition.

pmilot commented 4 months ago

@arzzon We've been running this build now for a week and I can confirm we have not lost our /Common objects since