Closed Jexf closed 1 year ago
@Jexf can you clarify the difference between solutions 1. and 2.? They both mention using OVS meters for the implementation.
@tnqn / @wenqiq any thoughts on this? That sounds like a good differentiating feature to me (and maybe the Egress Status could even report some information about bandwidth usage if this is technically possible based on the implementation method?).
@antoninbas Thanks for the reply. Solutions 1: Now the k8s community uses annotation to add QoS configuration information. Not only need to create an independent ifb device for each pod, but also need to restart the pod to make the configuration take effect after modifying the QoS configuration.
Maybe we can use crd to save QoS configuration and use meter to implement QoS function:
use crd to save QoS configuration, flexible setting the namespace and pod label for pods, which need to configure QoS Policy, maybe we can implement namespace-level QoS function. the crd example:
apiVersion: crd.antrea.io/v1alpha1
kind: Qos
metadata:
name: qos-sample-1
spec:
appliedTo:
- podSelector:
matchLabels:
app: qospolicy
namespaceSelector:
matchLabels:
namespace: qospolicy
ingress:
rate: 10000
burst: 2000
type: mbps
egress:
rate: 10000
burst: 2000
type: mpps
use meter to implement QoS function, not need to an independent ifb device for each pod(redirecting traffic to the ifb device may cause performance loss), the meter also can implement pps limit function.
We can also reuse the meter qos pipeline to implement Egress SNAT QoS function.
Solutions 2: Only use meter to implement Egress SNAT QoS function, only add bandwidth or pps limitation for Egress SNAT packets
Good idea.About solution 1, I think a more appropriate name of CRD kind is Bandwidth
instead of kind: Qos
?
IMO, I perfer solution 1, however the more conventional way to limit the bandwidth on pods is througth annotation in k8s community.
If we want to add QoS function for Egress feature, I think solution 2 should be more appropriate now.
The idea sounds good to me.
Good idea.About solution 1, I think a more appropriate name of CRD kind is
Bandwidth
instead ofkind: Qos
? IMO, I perfer solution 1, however the more conventional way to limit the bandwidth on pods is througth annotation in k8s community. If we want to add QoS function for Egress feature, I think solution 2 should be more appropriate now.
Thanks for your reply, Now the Kubernetes community qos function is so simple, if we use a new CRD to define and save bandwidth limitation info, we can implement a series of functions:
scope
field in Bandwidth
CRD, the shared
value means the applied pods share the sampe meter qos rule, and the standalone
value means each pod using a meter qos rule. So we can usescope
shared
value for namespace-level bandwidth limitation, the crd exmaple :apiVersion: crd.antrea.io/v1alpha1
kind: Bandwidth
metadata:
name: bandwidth-sample-for-namespace
spec:
appliedTo:
- podSelector:
matchLabels:
app: qospolicy
namespaceSelector:
matchLabels:
namespace: qospolicy
ingress:
rate: 100Mbps
burst: 100Mbps
scope: shared
egress:
rate: 100Mbps
burst: 100Mbps
scope: shared
pps limitation, the crd exmaple :
apiVersion: crd.antrea.io/v1alpha1
kind: Bandwidth
metadata:
name: bandwidth-sample-for-namespace
spec:
appliedTo:
- podSelector:
matchLabels:
app: qospolicy
namespaceSelector:
matchLabels:
namespace: qospolicy
ingress:
rate: 100Mpps
burst: 100Mpps
scope: standalone
egress:
rate: 100Mpps
burst: 100Mpps
scope: standalone
Solve a series of problems forannotation
and redirecting traffic to the ifb device.
@tnqn Thanks for your reply. Yes, we can just add an optional field in the Egress CRD for Egress bandwidth limitation, and it not conflict with the Bandwidth
limitation CRD. Maybe we don`t need to add meter qos rule for each pod, just sharing a meter qos rule for applied pods may be more simple.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days
Hi @Jexf,
Sorry to see this excellent feature proposed by you was blocked by some technical issues before. The existing implementation allowed different Egress to share the EgressIP, making it challenging to identify which exact Egress the packets belong to on the EgressIP Node. However, we recently had another round of review and discussion on it, and we basically reached an agreement that typically, users want different EgressIPs to differentiate Pods when creating different Egress objects. We could assume/ensure that Egress with QoS has its dedicated EgressIP, making things much more manageable.
I am reaching out to you to inquire about any updates you may have on this item. Specifically, what is your opinion about the new agreement? Are you still interested in or have the bandwidth to work on this item? Or have you already implemented a similar feature on your own fork/project? Please let me know if you have any questions or ideas to share. Thank you so much for your time and consideration.
BTW, I also sent an email to you via zhengdong.wu@transwarp.io which is reported as undeliverable lol.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days
Good idea.About solution 1, I think a more appropriate name of CRD kind is
Bandwidth
instead ofkind: Qos
? IMO, I perfer solution 1, however the more conventional way to limit the bandwidth on pods is througth annotation in k8s community. If we want to add QoS function for Egress feature, I think solution 2 should be more appropriate now.Thanks for your reply, Now the Kubernetes community qos function is so simple, if we use a new CRD to define and save bandwidth limitation info, we can implement a series of functions:
- namespace-level bandwidth limitation, we can add
scope
field inBandwidth
CRD, theshared
value means the applied pods share the sampe meter qos rule, and thestandalone
value means each pod using a meter qos rule. So we can usescope
shared
value for namespace-level bandwidth limitation, the crd exmaple :apiVersion: crd.antrea.io/v1alpha1 kind: Bandwidth metadata: name: bandwidth-sample-for-namespace spec: appliedTo: - podSelector: matchLabels: app: qospolicy namespaceSelector: matchLabels: namespace: qospolicy ingress: rate: 100Mbps burst: 100Mbps scope: shared egress: rate: 100Mbps burst: 100Mbps scope: shared
- pps limitation, the crd exmaple :
apiVersion: crd.antrea.io/v1alpha1 kind: Bandwidth metadata: name: bandwidth-sample-for-namespace spec: appliedTo: - podSelector: matchLabels: app: qospolicy namespaceSelector: matchLabels: namespace: qospolicy ingress: rate: 100Mpps burst: 100Mpps scope: standalone egress: rate: 100Mpps burst: 100Mpps scope: standalone
- Solve a series of problems for
annotation
and redirecting traffic to the ifb device.
I like this example. However, wouldn't it be better to follow the same standard of ACNP and ANP ? namely a separate crd for namespaced and cluster scoped.
Also would this limit bandwidth within the namespace or just for traffic originating from outside the namespace? Personally I would love to be able to specify both, there are scenarios where you may or may not have a fast private link between your nodes and limiting each namespace to something sensible would be great. In the cases where that's not a problem it would be great to only limit public traffic.
To expand even further, imagine the scenario where you host services outside of the current cluster but may or may not wish to impose limits to/from those services then a cidr selector to exclude/include ips from the limiter would also fit nicely.
The PR https://github.com/antrea-io/antrea/pull/5425 is for this issue.
Describe what you are trying to solve Egress feature is a CRD API that manages external access from the Pods in a cluster, and centralized redirects the SNAT packets to Egress Nodes. It may happen that some Pods occupy too much Egress SNAT bandwidth, such as third-party data copy, and there may cause network congestion on Egress Nodes. Although the current k8s has Pods bandwidth limitation function, but only unified set ingress and egress traffic, the Egress SNAT traffic cannot set QoS separately.
Maybe we can add QoS function for Egress feature, we can add bandwidth or pps limitation for it.
Describe the solution you have in mind
1.Maybe we can use ovs meter to implement k8s QoS feature, such as Pods Ingress/Egress bandwidth/pps limitation, Pod to Services bandwidth/pps limitation. But I have compared between ovs meter and tc, ovs meter is not as good as tc, the jitter for ovs meter is bigger than tc.
2.Use ovs meter to implement Egress QoS, only add bandwidth or pps limitation for Egress SNAT packets.