antrea-io / antrea

Kubernetes networking based on Open vSwitch
https://antrea.io
Apache License 2.0
1.67k stars 371 forks source link

Should we need to add QoS policy in Egress feature? #2766

Closed Jexf closed 1 year ago

Jexf commented 3 years ago

Describe what you are trying to solve Egress feature is a CRD API that manages external access from the Pods in a cluster, and centralized redirects the SNAT packets to Egress Nodes. It may happen that some Pods occupy too much Egress SNAT bandwidth, such as third-party data copy, and there may cause network congestion on Egress Nodes. Although the current k8s has Pods bandwidth limitation function, but only unified set ingress and egress traffic, the Egress SNAT traffic cannot set QoS separately.

Maybe we can add QoS function for Egress feature, we can add bandwidth or pps limitation for it.

Describe the solution you have in mind

1.Maybe we can use ovs meter to implement k8s QoS feature, such as Pods Ingress/Egress bandwidth/pps limitation, Pod to Services bandwidth/pps limitation. But I have compared between ovs meter and tc, ovs meter is not as good as tc, the jitter for ovs meter is bigger than tc.

2.Use ovs meter to implement Egress QoS, only add bandwidth or pps limitation for Egress SNAT packets.

antoninbas commented 3 years ago

@Jexf can you clarify the difference between solutions 1. and 2.? They both mention using OVS meters for the implementation.

@tnqn / @wenqiq any thoughts on this? That sounds like a good differentiating feature to me (and maybe the Egress Status could even report some information about bandwidth usage if this is technically possible based on the implementation method?).

Jexf commented 3 years ago

@antoninbas Thanks for the reply. Solutions 1: Now the k8s community uses annotation to add QoS configuration information. Not only need to create an independent ifb device for each pod, but also need to restart the pod to make the configuration take effect after modifying the QoS configuration.

Maybe we can use crd to save QoS configuration and use meter to implement QoS function:

  1. use crd to save QoS configuration, flexible setting the namespace and pod label for pods, which need to configure QoS Policy, maybe we can implement namespace-level QoS function. the crd example:

    apiVersion: crd.antrea.io/v1alpha1
    kind: Qos
    metadata:
    name: qos-sample-1
    spec:
    appliedTo:
    - podSelector:
        matchLabels:
          app: qospolicy
      namespaceSelector:
        matchLabels:
          namespace: qospolicy
    ingress:
    rate: 10000
    burst: 2000
    type: mbps
    egress:
    rate: 10000
    burst: 2000
    type: mpps
  2. use meter to implement QoS function, not need to an independent ifb device for each pod(redirecting traffic to the ifb device may cause performance loss), the meter also can implement pps limit function.

  3. We can also reuse the meter qos pipeline to implement Egress SNAT QoS function.

Solutions 2: Only use meter to implement Egress SNAT QoS function, only add bandwidth or pps limitation for Egress SNAT packets

wenqiq commented 3 years ago

Good idea.About solution 1, I think a more appropriate name of CRD kind is Bandwidth instead of kind: Qos? IMO, I perfer solution 1, however the more conventional way to limit the bandwidth on pods is througth annotation in k8s community. If we want to add QoS function for Egress feature, I think solution 2 should be more appropriate now.

tnqn commented 3 years ago

The idea sounds good to me.

  1. Will the QoS apply to off-cluster egress traffic of all Pods selected by the Egress (or the QoS if it's another CRD), or it's per Pod?
  2. Given Kubernetes has provided a way to specify total ingress and egress bandwidth of a Pod, I feel we could just add a QoS applying to off-cluster Egress traffic only. It could be an optional field of the Egress CRD.
  3. For @antoninbas's idea about reporting bandwidth usage to Egress Status, I think the answer of 1 may matter. If it's per Pod QoS, it may be more difficult to report. But I feel generally etcd may be not suitable to store frequently updated data, for which reason metrics-server and antrea networkpolicystats use aggregated apiserver to serve metric data.
Jexf commented 3 years ago

Good idea.About solution 1, I think a more appropriate name of CRD kind is Bandwidth instead of kind: Qos? IMO, I perfer solution 1, however the more conventional way to limit the bandwidth on pods is througth annotation in k8s community. If we want to add QoS function for Egress feature, I think solution 2 should be more appropriate now.

Thanks for your reply, Now the Kubernetes community qos function is so simple, if we use a new CRD to define and save bandwidth limitation info, we can implement a series of functions:

  1. namespace-level bandwidth limitation, we can add scope field in Bandwidth CRD, the sharedvalue means the applied pods share the sampe meter qos rule, and the standalone value means each pod using a meter qos rule. So we can usescope sharedvalue for namespace-level bandwidth limitation, the crd exmaple :
apiVersion: crd.antrea.io/v1alpha1
kind: Bandwidth
metadata:
  name: bandwidth-sample-for-namespace
spec:
  appliedTo:
    - podSelector:
        matchLabels:
          app: qospolicy
      namespaceSelector:
        matchLabels:
          namespace: qospolicy
  ingress:
    rate: 100Mbps
    burst: 100Mbps
    scope: shared
  egress:
    rate: 100Mbps
    burst: 100Mbps
    scope: shared
  1. pps limitation, the crd exmaple :

    apiVersion: crd.antrea.io/v1alpha1
    kind: Bandwidth
    metadata:
    name: bandwidth-sample-for-namespace
    spec:
    appliedTo:
    - podSelector:
        matchLabels:
          app: qospolicy
      namespaceSelector:
        matchLabels:
          namespace: qospolicy
    ingress:
    rate: 100Mpps
    burst: 100Mpps
    scope: standalone
    egress:
    rate: 100Mpps
    burst: 100Mpps
    scope: standalone
  2. Solve a series of problems forannotation and redirecting traffic to the ifb device.

Jexf commented 3 years ago

@tnqn Thanks for your reply. Yes, we can just add an optional field in the Egress CRD for Egress bandwidth limitation, and it not conflict with the Bandwidth limitation CRD. Maybe we don`t need to add meter qos rule for each pod, just sharing a meter qos rule for applied pods may be more simple.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

GraysonWu commented 1 year ago

Hi @Jexf,

Sorry to see this excellent feature proposed by you was blocked by some technical issues before. The existing implementation allowed different Egress to share the EgressIP, making it challenging to identify which exact Egress the packets belong to on the EgressIP Node. However, we recently had another round of review and discussion on it, and we basically reached an agreement that typically, users want different EgressIPs to differentiate Pods when creating different Egress objects. We could assume/ensure that Egress with QoS has its dedicated EgressIP, making things much more manageable.

I am reaching out to you to inquire about any updates you may have on this item. Specifically, what is your opinion about the new agreement? Are you still interested in or have the bandwidth to work on this item? Or have you already implemented a similar feature on your own fork/project? Please let me know if you have any questions or ideas to share. Thank you so much for your time and consideration.

BTW, I also sent an email to you via zhengdong.wu@transwarp.io which is reported as undeliverable lol.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

ColonelBundy commented 1 year ago

Good idea.About solution 1, I think a more appropriate name of CRD kind is Bandwidth instead of kind: Qos? IMO, I perfer solution 1, however the more conventional way to limit the bandwidth on pods is througth annotation in k8s community. If we want to add QoS function for Egress feature, I think solution 2 should be more appropriate now.

Thanks for your reply, Now the Kubernetes community qos function is so simple, if we use a new CRD to define and save bandwidth limitation info, we can implement a series of functions:

  1. namespace-level bandwidth limitation, we can add scope field in Bandwidth CRD, the sharedvalue means the applied pods share the sampe meter qos rule, and the standalone value means each pod using a meter qos rule. So we can usescope sharedvalue for namespace-level bandwidth limitation, the crd exmaple :
apiVersion: crd.antrea.io/v1alpha1
kind: Bandwidth
metadata:
  name: bandwidth-sample-for-namespace
spec:
  appliedTo:
    - podSelector:
        matchLabels:
          app: qospolicy
      namespaceSelector:
        matchLabels:
          namespace: qospolicy
  ingress:
    rate: 100Mbps
    burst: 100Mbps
    scope: shared
  egress:
    rate: 100Mbps
    burst: 100Mbps
    scope: shared
  1. pps limitation, the crd exmaple :
apiVersion: crd.antrea.io/v1alpha1
kind: Bandwidth
metadata:
  name: bandwidth-sample-for-namespace
spec:
  appliedTo:
    - podSelector:
        matchLabels:
          app: qospolicy
      namespaceSelector:
        matchLabels:
          namespace: qospolicy
  ingress:
    rate: 100Mpps
    burst: 100Mpps
    scope: standalone
  egress:
    rate: 100Mpps
    burst: 100Mpps
    scope: standalone
  1. Solve a series of problems forannotation and redirecting traffic to the ifb device.

I like this example. However, wouldn't it be better to follow the same standard of ACNP and ANP ? namely a separate crd for namespaced and cluster scoped.

Also would this limit bandwidth within the namespace or just for traffic originating from outside the namespace? Personally I would love to be able to specify both, there are scenarios where you may or may not have a fast private link between your nodes and limiting each namespace to something sensible would be great. In the cases where that's not a problem it would be great to only limit public traffic.

To expand even further, imagine the scenario where you host services outside of the current cluster but may or may not wish to impose limits to/from those services then a cidr selector to exclude/include ips from the limiter would also fit nicely.

luolanzone commented 1 year ago

The PR https://github.com/antrea-io/antrea/pull/5425 is for this issue.