This repository contains the artifact for the paper "hXDP: Efficient Software Packet Processing on FPGA NICs":
We provide the necessary documentation and tools to replicate the paper results. In addition we provide access to a machine equipped with a NetFPGA and configured to execute the experiments.
To replicate the results presented in the paper is it possible to follow two different paths:
Requisites for testing hXDP on your own are provided below.
A NetFPGA-SUME board is required to evaluate the artifact.
python 3
networkx
, transitions
llvm
(ver >= 6)If you want to synthesize the bitstream for hXDP on your own, you can download the Vivado project here.
You'll need to install also Xilinx Vivado Design Suite 2016.4 and obtain licenses, as explained here. Synthesis can take up to 3 hours!
git clone https://github.com/axbryd/hXDP-Artifacts.git
We will now describe how to compile and load an XDP program into the hXDP datapath. We assume that the environment to compile XDP programs is available and configured.
In order to compile the hXDP roms the hXDP compiler needs as input the xdp program bytecode.
llvm-objdump -S <xdp_prog>.o > <xdp_prog>.bytecode
If you want to skip this step, the Bytecodes of xdp programs used in the paper evaluation are already provided here
python3 ./parallelizer/parallelizer.py -i <xdp_prog>.o
If you want to skip this step, ROMs used in the paper evaluation are already provided here
./testbed_scripts/0_program_FPGA/program_fpga.sh ./testbed_scripts/0_program_FPGA/top_25_05_2020.bit
./testbed_scripts/2_datapath_programming/inject_sephirot_imem.py ./testbed_scripts/2_datapath_programming/SPH_roms/XDP_DROP.bin
Access to maps from userspace is provided through a simple script (reads) and "manually" (writes). The integration of hXDP with standard Linux tools, such as bpftools, for map access is an ongoing work and will be included in the next release.
./testbed_scripts/2_datapath_programming/dump_maps.py <n>
Where \
Maps are writtent trough the AXI-Lite protocol. We commit a 32-bit transaction, but since maps are 128-bit wide, we need to write to temporary 128-bit buffers and then committing the whole 128-bit line.
Write 32 bit of content (-w \
./testbed_scripts/2_datapath_programming/rwaxi -a 0x80010000 -w 0x2
Assert the commit bit to finalize the write
./testbed_scripts/2_datapath_programming/rwaxi -a 0x800100ff -w 0x1
Check the result
./testbed_scripts/2_datapath_programming/dump_maps.py 1
In pane #0, execute ./program_fpga.sh top_25_05_2020.bit
:
osdi20-aec@nino:~/0_program_FPGA$ ./program_fpga.sh top_25_05_2020.bit
****** Xilinx Microprocessor Debugger (XMD) Engine
****** XMD v2016.4 (64-bit)
**** SW Build 1756540 on Mon Jan 23 19:11:19 MST 2017
** Copyright 1986-2016 Xilinx, Inc. All Rights Reserved.
WARNING: XMD has been deprecated and will be removed in future.
XSDB replaces XMD and provides additional functionality.
We recommend you switch to XSDB for commandline debugging.
Please refer to SDK help for more details.
XMD%
XMD% Configuring Device 1 (xc7vx690t) with Bitstream -- top_25_05_2020.bit
................10...............20...............30................40...............50...............60...............70................80...............90...............Successfully downloaded bit file.
JTAG chain configuration
--------------------------------------------------
Device ID Code IR Length Part Name
1 33691093 6 xc7vx690t
2 16d7e093 8 xc2c512
0
XMD%
Completed rescan PCIe information !
In pane #1, launch:
osdi20-aec@nino:~/1_datapath_monitor$ ./hXDP_monitor.py
Navigate to pane #3 and launch ./0_DPDK_bind_ifaces.sh
. The output should be:
Network devices using DPDK-compatible driver
============================================
0000:03:00.0 'Ethernet 10G 2P X520 Adapter 154d' drv=igb_uio unused=ixgbe
0000:03:00.1 'Ethernet 10G 2P X520 Adapter 154d' drv=igb_uio unused=ixgbe
0000:81:00.0 'Ethernet 10G 2P X520 Adapter 154d' drv=igb_uio unused=ixgbe
0000:81:00.1 'Ethernet 10G 2P X520 Adapter 154d' drv=igb_uio unused=ixgbe
Network devices using kernel driver
===================================
0000:01:00.0 'I350 Gigabit Network Connection 1521' if=eno1 drv=igb unused=igb_uio *Active*
0000:01:00.1 'I350 Gigabit Network Connection 1521' if=eno2 drv=igb unused=igb_uio
0000:01:00.2 'I350 Gigabit Network Connection 1521' if=eno3 drv=igb unused=igb_uio
0000:01:00.3 'I350 Gigabit Network Connection 1521' if=eno4 drv=igb unused=igb_uio
No 'Crypto' devices detected
============================
No 'Eventdev' devices detected
==============================
No 'Mempool' devices detected
=============================
No 'Compress' devices detected
==============================
In this section, we describe how to recreate the microbenchmarks results depicted in the paper.
In pane #2, let's program Sephirot's memory with the relevant ROM file, containing the XDP Drop program:
osdi20-aec@nino:~/2_datapath_programming$ ./inject_sephirot_imem.py SPH_roms/XDP_DROP.bin
You can double-check the proper injection by dumping the content of Sephirot's instruction memory:
osdi20-aec@nino:~/2_datapath_programming$ ./dump_sephirot_imem.py 10
NetFPGA-SUME Detected
Reaing from 0 Sephirot Core
0x0 : 0000000000000000 | 0000000000000000 | 00000001000000b7 | 0000000000000095 |
0x1 : 0000000000000000 | 0000000000000000 | 0000000000000000 | 0000000000000000 |
0x2 : 0000000000000000 | 0000000000000000 | 0000000000000000 | 0000000000000000 |
0x3 : 0000000000000000 | 0000000000000000 | 0000000000000000 | 0000000000000000 |
0x4 : 0000000000000000 | 0000000000000000 | 0000000000000000 | 0000000000000000 |
0x5 : 0000000000000000 | 0000000000000000 | 0000000000000000 | 0000000000000000 |
0x6 : 0000000000000000 | 0000000000000000 | 0000000000000000 | 0000000000000000 |
0x7 : 0000000000000000 | 0000000000000000 | 0000000000000000 | 0000000000000000 |
0x8 : 0000000000000000 | 0000000000000000 | 0000000000000000 | 0000000000000000 |
0x9 : 0000000000000000 | 0000000000000000 | 0000000000000000 | 0000000000000000 |
Where 10 is the number of Very-Long Instruction Words to be fetched form the memory.
We can now move to pane #3 to generate traffic osdi20-aec@ercole:~/3_traffic_generation$ ./1_throughput_test_64.sh
.
Moving on pane #1, we should see the datapath responding:
RECEIVED PACKETS: 446495634 pkts
XDP_DROP OCCURENCIES: 180722170 pkts
ARRIVAL RATE: 56.584462 Mpps
DROP RATE: 22.41349 Mpps
TX RATE: 0.0 Mpps
For the next benchmarks, since the steps are the same except for the ROM to be loaded and the test type, we just point out these details for the sake of brevity.
XDP_DROP_early_exit.bin
1_throughput_test_64.sh
XDP_TX_early_exit.bin
1_throughput_test_64.sh, 2_latency_minimum_size.sh, 3_latency_maximum_size.sh
Here we describe how to run the same examples depicted in the software section on hXDP. Before doing this, we describe how to compile and optimize the original eBPF bytecode to run on Sephirot. Tests are run as reported in the XDP Drop microbenchmark.
This is an optional step since all the ROMs are provided in the repo and inside the testbed.
You can find the parallelizer inside the relevant folder in this repo. To compile and optimize all the examples we've seen in the previous section, run ./parallelize_all.sh
. You find the generated output products inside the out
sub-folder.
xdp1.bin.out
1_throughput_test_64.sh, 2_latency_minimum_size.sh, 3_latency_maximum_size.sh
xdp1.bin.out
1_throughput_test_64.sh, 2_latency_minimum_size.sh, 3_latency_maximum_size.sh
xdp_adjust_tail.bin.out
1_throughput_test_64.sh, 2_latency_minimum_size.sh, 3_latency_maximum_size.sh
xdp_router_ipv4.bin.out
1_throughput_test_64.sh, 2_latency_minimum_size.sh, 3_latency_maximum_size.sh
xdp_rxq_info.bin.out
1_throughput_test_64.sh, 2_latency_minimum_size.sh, 3_latency_maximum_size.sh
xdp_tx_iptunnel.bin.out
1_throughput_test_64.sh, 2_latency_minimum_size.sh, 3_latency_maximum_size.sh
xdp_redirect_map_kern.bin.out
1_throughput_test_64.sh, 2_latency_minimum_size.sh, 3_latency_maximum_size.sh
nec_udp_firewall.bin.out
1_throughput_test_64.sh, 2_latency_minimum_size.sh, 3_latency_maximum_size.sh
In this section we will describe how to replicate the XDP Linux baseline tests. Most of the XDP programs used in the experiments are included in the Linux kernel tree (linux/samples/bpf), additional programs are provided in this repository.
The experimental methodology and configuration follows what proposed in the XDP paper ("The eXpress data path: fast programmable packet processing in the operating system kernel", our baseline results are combarable to what presented in the paper and in the paper repo
Results provided in the paper have been obtained using the following hardware/software configuration:
Unfortunately we cannot provide remote access to such machine.
In the following paragraph we will assume that the Linux kernel source code has been downloaded in
Download linux-5.6.4 source code from here extract, compile and run the kernel. If you are not familiar with kernel compilation follow the instructions provided here
Update the Linux XDP samples with the additional programs
cp <hXDP_repo>/xdp_progs/* <kernel_source>/samples/bpf/
Patch the Makefile for in tree compilation of the additional programs
patch Makefile Makefile.patch
execute prog:
./xdp1 <eth_ifname>
traffic type: any
execute prog:
./xdp1 <eth_ifname>
traffic type: UDP
execute prog:
./xdp_adjust_tail -i <eth_ifname>
traffic type: ipv4, UDP
execute prog:
./xdp_router_ipv4 <eth_0_ifname> ... <eth_n_ifname>
traffic type: ipv4
execute prog:
./xdp_rxq_info -d <eth_ifname> --action XDP_DROP -s 3
execute prog:
./xdp_tx_iptunnel -i <eth_0_ifname> -a 192.168.0.2 -p 80 -s 10.0.0.1 -d 10.0.0.2 -m 0c:fd:fe:bb:3d:b0
execute prog:
./xdp_redirect_map <eth_in_ifname> <eth_out_ifname>
traffic type: any
config:
make
execute prog:
./xdp_fw
traffic type: ipv4, UDP