Nordix / Meridio

Facilitator of attraction and distribution of external traffic within Kubernetes via secondary networks
https://meridio.nordix.org
Apache License 2.0
46 stars 9 forks source link

Target Registration Standardization (service-fication) #317

Open LionelJouin opened 1 year ago

LionelJouin commented 1 year ago

Is your feature request related to a problem? Please describe.

Requirements/Objectives

Describe the solution you'd like

Kubernetes Service Object exposed via namespace-wide service

The user will create a Kubernetes Service with a selector corresponding to the targets that have to be connected to a stream. This way, via some mechanism, the TAPA will automatically detect it has to be connected to a stream without any interaction via the TAP API by the user application.

Benefit and drawbacks

+ Kubernetes Service exists in every cluster (since it is part of the core API)
+ Readiness of the target handled by Kubernetes
- Confusion Kubernetes Service / Meridio Service
- Port required to define a service (port is useless and will be ignored by Meridio)

Service definition example

---
apiVersion: v1
kind: Service
metadata:
  name: target-pool-service
  labels:
    service.kubernetes.io/service-proxy-name: meridio-target-pool-service # Delegates service control to custom proxy
    meridio.nordix.org/trench: trench-a
    meridio.nordix.org/conduit: conduit-a
    meridio.nordix.org/stream: stream-a
spec:
  clusterIP: None # Headless service, no service IP allocated
  selector:
    app: test-deployment # Pods with the TAPA containing this label will be connected to the stream
  ports:
    - port: 80
Label: service.kubernetes.io/service-proxy-name

https://kubernetes.io/docs/reference/labels-annotations-taints/#servicekubernetesioservice-proxy-name

Headless service: clusterIP: None

No IP will be allocated for the service. The DNS entry will look like this with 4 endpoints in the Kubernetes service:

root@test-deployment-6675d5fd84-5xdlz:/# nslookup target-pool-service 
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   target-pool-service.red.svc.cluster.local
Address: 10.244.2.2
Name:   target-pool-service.red.svc.cluster.local
Address: 10.244.1.2
Name:   target-pool-service.red.svc.cluster.local
Address: 10.244.2.3
Name:   target-pool-service.red.svc.cluster.local
Address: 10.244.1.3

https://kubernetes.io/docs/concepts/services-networking/service/#headless-services

Port

At least 1 port must be included with a valid value (1 to 65535) https://github.com/kubernetes/api/blob/master/core/v1/types.go#L4600

 [spec.ports[0].port: Invalid value: 0: must be between 1 and 65535, inclusive, spec.ports[0].targetPort: Invalid value: 0: must be between 1 and 65535, inclusive]

Implementation

When a user will create a new service, the operator 'endpoints watcher' will detect it, and collect all Kubernetes endpoints related to it and propagate the information (list of opened streams) to the corresponding watching pods.

The TAPA contains a new thread which watches, via the 'TargetPool' service, which streams has to be opened (streams opened but not in the list will be closed). To identify itself, an identifier contained in the Endpoint object should be provided by the TAPA. In this case, the IP is given, but the pod reference could also work (see: https://github.com/kubernetes/api/blob/release-1.25/discovery/v1/types.go#L71)

Diagrams drawio

Questions
  1. How it is supposed to work when targets are in another namespace then the operator? (TAPA env: "MERIDIO_NAMESPACE")
  2. Should we differenciate the streams opened via TAP API and "Standardised way'?
    • If not, the 'TargetPool watcher' could close all streams that are not returned by the TargetPool Service.
  3. What name should be given to the 'TargetPool' Service?

Describe alternatives you've considered

Kubernetes Service Object exposed via NSP

Very similary to the Kubernetes Service Object exposed via namespace-wide service, but the service is exposed by the NSP.

  1. How the TAPA can watch the streams it has to connect to?
    • The NSP is trench-wide, not namespace-wide, so the TAPA connects to a NSP service only when a stream has to be opened. The TAPA can then not use the NSP to check if it has to be connected to a stream via this new standard way.

Solutions for problem 1:

  1. Move the NSP and Service to be namespace-wide
    • This will be an NBC since the old TAPA will no longer have the correct NSP Service name
  2. Move the NSP to be namespace-wide, keep the service trench-wide and create a new namespace-wide service

Stream selector property

(alternative to replace the usage of Kubernetes Service)

---
apiVersion: meridio.nordix.org/v1alpha1
kind: Stream
metadata:
  name: stream-a
  labels:
    trench: trench-a
spec:
  conduit: lb-fe
  selector:
    app: test-deployment

New Object: TargetPool

(alternative to replace the usage of Kubernetes Service)

---
apiVersion: meridio.nordix.org/v1alpha1
kind: TargetPool
metadata:
  name: target-pool-a
spec:
  stream: stream-a
  selector:
    app: test-deployment

Additional context

/

tedlean commented 1 year ago

Some comments and ideas:

  1. What is the difference between "service.kubernetes.io/service-proxy-name" and the "loadBalancerClass"? Why is the first option preferred?

  2. Regarding the superfluous port configuration: Users are used to provide this, and it does not hurt to have it. It can be seen as informative for now. It could even come in handy over time as it could be used for implementation of cross-checking on the flows. It might actually be possible to shift between primary and secondary just by deleting the service.kubernetes.io/service-proxy-name if the service contains the information needed for both. (Btw. is port still mandatory if the loadbalancerClass is used?)

  3. The endpoint watcher belongs in Meridio Operator as shown. Still there is an issue with namespaces, as the service and endpoint slices exist in the application namespace which might be different than the Meridio namespace, where the operator is running. Therefore, the operator will need watch for services and endpoint slices in more namespaces. To control this, a proposal would be to add a list of possible target-pod namespaces to the trench CR to indicate which namespaces can use this trench in a service definition and thus has to be watched by the operator.

  4. The endpoint watcher in the operator should push the actual service and endpoint status into a "target pool server" as shown (.. and yes, a better name is needed). This service seems to belong together with the NSP functionality as something to be exposed to all Meridio entities. To solve the current NSP - trench binding, solution "2. Move the NSP to be namespace-wide, keep the service trench-wide and create a new namespace-wide service" seems preferable. One reason for keeping the information in the NSP, is that there are close relations to the availability handling of the target pod. Actually, it would be possible to update the LBs directly with target pods being "ready or not" very quickly without having to pass information up to the TAPA and back again.

  5. The TAPA should at startup sign-up for the target pool service to watch for changes in the target pools. There seems to be a risk having a lot of information passing up to the TAPAs and a potential bottleneck, if every TAPA has to download all information for all target pool each time a single change has happened. It could be nice to have some intelligence in the NSP target pool server, so only the affected TAPA would be notified in order to read the target pool information.

  6. Yes, the current API approach should be preserved, and the streams should be marked somehow to keep them out of the automatic control. This is for example needed for the SCTP solution, which is based on streams, that hardly can follow the declarative service concept.

LionelJouin commented 1 year ago

1

I am not 100% sure, but from my experiments, with loadBalancerClass, there is configuration made by Kubernetes. For instance, in kube-proxy, I can see load-balancing entries in IPVS for that service.

It is possible to combine loadBalancerClass + service.kubernetes.io/service-proxy-name, in that case, the only configuration I see is the allocation of the a service IP.

With clusterIP: None (Headless service) + service.kubernetes.io/service-proxy-name, I cannot see any configuration.

I tried clusterIP: None + loadBalancerClass but this is not possible: "Invalid value: "None": may not be set to 'None' for LoadBalancer services"

https://github.com/kubernetes/api/blob/master/core/v1/types.go#L4596

2

Yes, the port is still mandatory if the loadbalancerClass is used.

3

Right, this is a problem. If we need to do this, then a clusterrole will be needed to watch services in all namespaces.

4

I am not sure if I understood, if a target is not available, then it will not be in the endpoint list, so the TAPA will not see it while watching the 'Target Pool' Service, and will then not open any stream.

5

No, at startup, the TAPA will use the watch function with its own ID as parameter, the the 'Target Pool' server will then send the list of streams only for that ID and will send updates only when that list (for that particular ID) will change.

tedlean commented 1 year ago

1: Okay, it seems that clusterIP: None (Headless service) + service.kubernetes.io/service-proxy-name, is the way to go then.

2: -

3: Yes, I guess the cluster-role is needed to cross namespace boundaries. Still I think we have to pinpoint the namespaces that should be watched by each operator in case of having more nVIP installations on the cluster. That is why I suggest having a list of possible target namespaces in the Trench CR. This is also to enforce some kind of security on who is allowed to connect to what. Maybe it even should be decided by the system integrator if he in general will allow the namespace crossing using a cluster-role. Btw how does it actually work today. The TAPA can talk to a NSP in another namespace without having a cluster-role, right?

4: I was mostly thinking about the pod state going from ready to not-ready. I guess this has to be signaled as fast as possible the LB to minimize traffic blackholing. But for a start we can leave it up the target to control this based on the Target Pool watching. My proposal is just an optimalization.

5: Great! then schema of only notifying the impacted pods will be in place from day one.

6: -

LionelJouin commented 1 year ago
  1. Right, the TAPA can talk to a NSP in any namespace without having any role. In Kubernetes, as prefix to the service name, you can add .<namespace>, then the DNS will return you IP of the service in that namespace. I have never seen anything that restrict that.
LionelJouin commented 1 year ago
  1. We should add the namespace of the trench/conduit/stream as part of the labels/annotations of the service. Otherwise it would be possible multiples trenches are watching the same namespace service with the same targets.
tedlean commented 1 year ago

Yes, you are right. To avoid ambiguity the nVIP namespace where the trench/conduit/stream is deployed need to be referred as well. I guess leaving it empty would mean "own namespace". The fully qualified name for a stream would be: [namespace]/trench/conduit/stream