F5Networks / k8s-bigip-ctlr

Repository for F5 Container Ingress Services for Kubernetes & OpenShift.
Apache License 2.0
359 stars 195 forks source link

Openshift SDN integration fails in VxLan manager #2375

Closed bukovjanmic closed 1 year ago

bukovjanmic commented 2 years ago

Setup Details

CIS Version : 2.8.1
Build: f5networks/k8s-bigip-ctlr:latest
BIGIP Version: 15.1.2 AS3 Version: Big IP 3.32.0-4 Agent Mode: AS3
Orchestration: OSCP
Orchestration Version: 4.9.24 Pool Mode: Cluster Additional Setup details: <Platform/CNI Plugins/ cluster nodes/ etc>

Description

When creating VirtualServers or TransportServers, VxLan integration does not seem to work.

In operator logs, we see error messages:

2022/04/25 15:30:30 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.130.5.63's node.

looking at the source code, it seems the VxLAN manager is looking for flannel.alpha.coreos.com/public-ip annotation in node where the pod is running.

Howerer, no such annotation is present on the nodes, as we run openshift sdn in networkpolicy mode.

Our setup:

apiVersion: cis.f5.com/v1
kind: F5BigIpCtlr
metadata:
  annotations:
    operator-sdk/primary-resource: kube-system/f5-server-f5-bigip-ctlr
    operator-sdk/primary-resource-type: Deployment.apps
  name: f5-server
  namespace: openshift-operators
  finalizers:
    - helm.sdk.operatorframework.io/uninstall-release
spec:
  args:
    manage_routes: false
    agent: as3
    custom-resource-mode: true
    log_level: info
    route-vserver-addr: 172.31.8.220
    openshift-sdn-name: /occ01/occ01-tunnel
    bigip_partition: occ01
    ipam: true
    default-route-domain: 14
    disable-teems: true
    bigip_url: 172.31.8.4
    log_as3_response: true
    insecure: true
    pool-member-type: cluster
  bigip_login_secret: bigip
  image:
    pullPolicy: Always
    repo: k8s-bigip-ctlr
    user: f5networks
  namespace: kube-system
  rbac:
    create: true
  resources: {}
  serviceAccount:
    create: true
  version: latest

In NodePort mode, the integration seems to work.

Documentation does not specify that Flannel is a requirement.

Could you advise us, what is the problem?

Thanks,

Michal Bukovjan

vincentmli commented 2 years ago

@bukovjanmic If I recall correct, the controller does not require static pod MAC for openshift environment, the error log you pointed out only apply to k8s/flannel environment, openshift pod MAC is resolved through dynamic ARP request/response resolution.

mdditt2000 commented 2 years ago

Thanks @vincentmli you are correct.

@bukovjanmic can you check a few things. First check the VXLAN policy within BIG-IP. Mac sure you see FBD entries.

If not please review CIS VXLAN troubleshooting https://support.f5.com/csp/article/K43473164

Here are some good user-guides and demos

OpenShift 4.7 and F5 Container Ingress Services (CIS) User-Guide for Standalone BIG-IP https://github.com/mdditt2000/k8s-bigip-ctlr/blob/main/user_guides/openshift-4-7/standalone/README.md

OpenShift 4.7 and F5 Container Ingress Services (CIS) User-Guide for BIG-IP Cluster https://github.com/mdditt2000/k8s-bigip-ctlr/blob/main/user_guides/openshift-4-7/cluster/README.md

Let me know if you have any questions

bukovjanmic commented 2 years ago

I agree, this is what we find strange, We triple-checked the guides above (we use the BIG-IP Cluster version), and everything seems to be correct,, we see VxLAN on the Big-IP:

(cfg-sync In Sync)(Active)(/occ01)(tmos)# show net fdb tunnel occ01-tunnel

------------------------------------------------------------------
Net::FDB
Tunnel        Mac Address        Member                    Dynamic
------------------------------------------------------------------
occ01-tunnel  0a:0a:ac:1f:08:72  endpoint:172.31.8.114%14  no
occ01-tunnel  0a:0a:ac:1f:08:76  endpoint:172.31.8.118%14  no
occ01-tunnel  0a:0a:ac:1f:08:14  endpoint:172.31.8.20%14   no
occ01-tunnel  0a:0a:ac:1f:08:15  endpoint:172.31.8.21%14   no
occ01-tunnel  0a:0a:ac:1f:08:16  endpoint:172.31.8.22%14   no
occ01-tunnel  0a:0a:ac:1f:08:17  endpoint:172.31.8.23%14   no
occ01-tunnel  0a:0a:ac:1f:08:18  endpoint:172.31.8.24%14   no
occ01-tunnel  0a:0a:ac:1f:08:19  endpoint:172.31.8.25%14   no
occ01-tunnel  0a:0a:ac:1f:08:1a  endpoint:172.31.8.26%14   no
occ01-tunnel  0a:0a:ac:1f:08:1b  endpoint:172.31.8.27%14   no
occ01-tunnel  0a:0a:ac:1f:08:1c  endpoint:172.31.8.28%14   no

but still get the above error in the f5-server-f5-bigip-ctlr pod on Openshift side and TransportServers/VirtualServers are not set up in cluster mode.

What we will try:

It may also be possibe the the route domain 14 could be a problem?

vincentmli commented 2 years ago

What we will try:

  • re-setup this with the occ01 profile defined in Common partition
  • upgrade Big-IP to version 16

It may also be possibe the the route domain 14 could be a problem?

@bukovjanmic yes it is worth trying. a side note, I have been working on Cilium CNI VXLAN integration with BIG-IP for the past year and it should work much better and much less configuration overhead, The Cilium VXLAN integration feature release v1.12.0 will be around June this year, Openshift supports Cilium CNI too.

vincentmli commented 2 years ago

@bukovjanmic @mdditt2000 I actually could get same error while testing Kubernetes Cilium CNI VXLAN integration with BIG-IP and CIS 2.8.1, in Cilium, static POD arp is not required. CIS logs the error, but appears does not affect the network connectivity in my case, does the error log affect your network connectivities ?

vincentmli commented 2 years ago

after I changed my lab CIS argument from --flannel_vxlan to --openshift-sdn-name, the error disappeared, @bukovjanmic could you check if you are using --openshift-sdn-name argument?

bukovjanmic commented 2 years ago

Yes, I am using openshift-sdn-name arg in the operator CR and no flannel at all.

vincentmli commented 2 years ago

@bukovjanmic can you provide the CIS log?

trinaths commented 1 year ago

No response from the user. Recommend use OCP+OVNK8S CNI with CIS.