antrea-io / antrea

Kubernetes networking based on Open vSwitch
https://antrea.io
Apache License 2.0
1.63k stars 346 forks source link

Rate limiting of PacketIn messages should be enforced in the dataplane #2069

Closed antoninbas closed 3 years ago

antoninbas commented 3 years ago

Describe the bug Antrea uses the PacketIn feature of OVS to redirect some packets from the datapath to the local controlplane (Agent). This is required to implement the following Antrea features:

In theory, it is possible for someone to generate a large amount of packets in the Pod network that will be sent as PacketIn messages to the Antrea Agent. For example, if a NetworkPolicy rule has been defined to reject UDP traffic from Pod A to Pod B, and Pod A is aware of this policy and is "malicious", Pod A can keep crafting new UDP packets that will match the NetworkPolicy rule and be sent to the Antrea Agent.

This is is an issue because this can greatly increase the CPU usage of the Agent, and can cause other unrelated PacketIn messages to be dropped (e.g. Traceflow messages). It may even interfere with OpenFlow Bundle messages used by the Agent to program the OVS datapath (not confirmed).

There is some irony here: one defines a NetworkPolicy to protect Pod network communications but in doing so (if the NetworkPolicy uses Logging or the Reject action) it can potentially open up the network to DDOS attacks, from the very Pods to which the NetworkPolicy applies.

In Antrea v1.0.0, some steps were taken to mitigate this: https://github.com/vmware-tanzu/antrea/pull/2015. The idea was to limit the maximum CPU usage that can be incurred in the Agent by PacketIn messages, along with using different queues for Traceflow messages and NetworkPolicy messages in the Agent. However, this is a very limited solution. PacketIn messages must be filtered in the OVS datapath if we want a full-proof solution. The current plan is to leverage OpenFlow meters to rate limit PacketIn messages in the datapath, with different meter objects for different types of packets (this will ensure that Traceflow requests are not impacted when a large number of packets match NetworkPolicy logging / reject rules). We want to implement this by Antrea v1.1.

To Reproduce See https://github.com/vmware-tanzu/antrea/pull/2015 for some benchmarks.

The following steps can generally be followed:

Expected

Actual behavior

Versions: Antrea v1.0.0

Additional context For more information about OpenFlow meters, refer to the OpenFlow 1.3 specification.

GraysonWu commented 3 years ago

Thank you @antoninbas for describing this in such detail. Working on it.