cncf / cnf-testbed

ARCHIVED: 🧪🛏️Cloud-native Network Function (CNF) Testbed --> See LFN Cloud Native Telecom Initiative https://wiki.lfnetworking.org/pages/viewpage.action?pageId=113213592
https://wiki.lfnetworking.org/pages/viewpage.action?pageId=113213592
Apache License 2.0
163 stars 51 forks source link

Summary of networking issues found with NF testing #44

Closed lixuna closed 6 years ago

lixuna commented 6 years ago

Write up on networking issues with NF testing, including but not limited to:

taylor commented 6 years ago

Google doc with Summary of networking issues found with NF testing

taylor commented 6 years ago

Network configuration requirements for Packet

For the initial box-by-box benchmark and comparison we are only interested in the performance of individual VNFs and CNFs, with focus on the data plane performance (throughput) and memory usage (resident set size, RSS).

For these tests the data plane network should be as simple as possible, which can be realized by attaching VFs (Virtual Functions) directly to the VNF or CNF being tested. The traffic generator (Pktgen) runs on a separate instance and can be connected via either PFs (Physical Functions) or VFs depending on the network configuration provided by Packet. Given that the current configuration runs both data plane and management / external networks through the same NIC, the connections will likely be based on VFs created from single port / PF, as the other port will be handling management and external network.

Below is a small diagram showing how this implementation can be realized using two Packet instances. Note that the data plane network will need to be configured with a VLAN to act as an L2 connection between instances.

image

The main requirement for this to work is that the necessary flags for SR-IOV are set in the BIOS (it should be possible to configure this via the Packet.net customer portal)

There are other configurations for the “System Under Test” instance that involves the use of VPP between the NIC and the VNFs and CNFs, but this only changes the software requirements, and should not change the requirements for Packet.

taylor commented 6 years ago

Network configuration requirements for fd.io

The requirements for fd.io are very similar to those for Packet. The biggest difference is seen in the connections between instances, as the fd.io CSIT testbeds have NICs dedicated for data plane traffic using point-to-point connections that removes the need for configuring the data plane network.

The diagram below shows the configuration that has been used for benchmarks.

image

By default the testbeds don’t fully support IOMMU. This can be fixed by enabling Intel. VT for Directed I/O (VT-d) in the BIOS (listed under Chipset -> North Bridge -> IIO Configuration). Details about testbeds and network connections are available through CSIT

taylor commented 6 years ago

Networking requirements for VPP/DPDK NFs

Focusing on the “System Under Test” instance, there are several ways that this can be configured to support multiple VNFs and CNFs. Examples of how this can be done is shown in the diagram below.

image

Most of these connections have been partially tested on Packet hardware.

taylor commented 6 years ago

Issues seen during deployment and testing

Initial deployments were done on a single “all-in-one” instance, meaning both traffic generator and NF was running side by side. The data plane network was implemented using the default bridge implementations available in the frameworks used for virtualization, Vagrant (libvirt) for VMs and Docker for containers. Both of these work in similar fashion, as can be seen in the diagram below.

image

While both of these deployments did work, the amount of traffic that can be handled by these host bridges is very limited, to the point where the VNFs/CNFs would only be utilizing a few percent of their available resources. Variations based on these configurations were also tested, e.g. using TCP tunnels between the traffic generator and NF, but the results were similar to what was observed using the host bridges.

A different approach using an “all-in-one” instance was also tested, this one using VPP as the data plane network inside a single instance. The diagram below shows the configuration differences when testing either VNFs or CNFs.

  1. https://community.mellanox.com/docs/DOC-2386

  2. https://community.mellanox.com/docs/DOC-2729

image

The traffic generator is deployed as a VNF in both scenarios, as it currently only supports attaching to PCI devices, which is done through the Vhost to “Virtio PCI” mapping that happens in the VM. This solution removes the bottleneck that was seen previously with host bridges. This solution is however also not ideal, as the traffic generator only supports a single queue per “Virtio PCI” interface, which limits the number of CPU cores that can be used to one per interface, or two in total with the configuration used. While the throughput is several times higher compared to the bridge configuration, it is still too low to fully utilize the resources available for the VNFs/CNFs.

List of issues with references