crossroadsfpga / pigasus

100Gbps Intrusion Detection and Prevention System
670 stars 75 forks source link

Implementing SurgeProtector for Flow Reassembly #16

Open NAtre opened 2 years ago

NAtre commented 2 years ago

Summary

This changelist implements SurgeProtector in the context of Pigasus's FPGA-based flow reassembler. SurgeProtector is an adversarial scheduling framework that protects vulnerable NFs against Algorithmic Complexity Attacks (ACAs), a potent class of Denial-of-Service (DoS) attacks. A brief overview of ACAs and the framework itself is provided in the README of the SurgeProtector source directory.

The most significant changes to the Pigasus datapath (and, more generally, the repo) are as follows:

  1. Reassembly scheduler: Previously, all out-of-order packets were directly handed off to the reassembler by the flow table in the order that they arrived (i.e., packets were served FCFS). With this change, we interpose a _scheduler_ between the flow table and the reassembler, which decides the order in which packets will be served. The high-level operation is as follows. First, the scheduler continually receives out-of-order packets from the flow table and buffers them in per-flow queues (implemented using singly linked-lists). In parallel, the scheduler chooses an active, out-of-order flow based on a user-specified scheduling policy (configured at compile time), and dequeues the packet at the head of the corresponding flow queue; this packet is then sent to flow reassembly to be served. When the reassembler signals completion, the scheduler propagates any updates regarding flow state to the flow table (more details on this below), and the process repeats.

    The scheduler currently supports two scheduling policies: FCFS (the policy implicitly used in Pigasus today), and WSJF (the policy underlying SurgeProtector). The policy can be configured by setting the SCHEDULER_REASSEMBLY_POLICY parameter in struct_s.sv (defaults to FCFS).

  2. Decoupled in-order and out-of-order flow state. Previously, the flow table was responsible for managing state for both in-order and out-of-order (OOO) flows. This change offloads the responsibility of managing out-of-order flow state (i.e., next expected PSN, pointer to the head of the OOO linked-list, etc.) to a separate data-structure called the OOO flow context table (OOO FCT) (implemented in the scheduler). Every out-of-order flow is assigned an OOO flow ID (stored as part of the flow context in the primary flow table), which in turn serves as an index into the OOO flow context table. The updated architecture has two advantages: (1) We can choose to support a different (fewer) number of OOO flows than the total flow count (saving BRAM in the process because not every flow now needs to be associated with an OOO flow context); and, more importantly, (2) It reduces contention for the primary flow table. Since OOO flow state is decoupled from the global flow context, the slow path needs to update the primary flow table far less frequently; in particular, reassembly updates are propagated to the primary flow table only under two conditions: when the next expected PSN changes (e.g. the flow becomes in-order or some packets can be released), or the flow is dropped (more details on this below).

  3. Flow Garbage Collection (GC). Previously, Pigasus's reassembler would permanently stall once all its linked-list entries were occupied, putting the slow path in an irrecoverable state. This change implements flow garbage collection in both the scheduler and the reassembler, allowing execution to continue despite memory pressure that would otherwise overwhelm the slow path. GC is orchestrated by the scheduler, which initiates the process when the available memory in either the scheduler or the reassembler falls below some user-specified watermark levels (set here). In order to do this, the scheduler fetches the lowest-priority element among the set of active, out-of-order flows (once again determined by the scheduling policy), then garbage-collects the corresponding flow's entries in (a) the scheduler's flow queue, and (b) the reassembler's out-of-order linked list. Finally, it deallocates the flow context in the primary flow table (i.e. drops the flow).

  4. Test harness. This change also introduces a bash-based test harness that may be used to implement unit- or regression test suites for various Pigasus components. Sources can be found in the testbench directory, and are named as per the modules they seek to test. The testcases themselves are written in SystemVerilog (e.g., here). For instance, here is the output produced by the run_test.sh script corresponding to the SurgeProtector's pipelined_heap module:

    Running TEST_BASIC_ENQUE... PASS
    Running TEST_BASIC_DEQUE_MIN... PASS
    Running TEST_BASIC_DEQUE_MAX... PASS
    Running TEST_HEAP_PROPERTY... PASS
    Running TEST_CAPACITY_LIMITS... PASS
    Running TEST_PIPELINING_ENQUE... PASS
    Running TEST_PIPELINING_MIXED... PASS
    Running TEST_DEQUE_COLLISIONS... PASS
    Running TEST_RESET... PASS

Also fixes some corner-case bugs (e.g., fast/slow path data races) that show up when stress-testing the reassembler.

Resources and Timing

The following table depicts resource utilization before this change (Baseline Pigasus), after this change but with SurgeProtector disabled (Scheduler+FCFS), and, finally, with SurgeProtector enabled (Scheduler+SurgeProtector). DSP and eSRAM usage does not change (0 in all cases), so is not reported for the sake of brevity. Compared to the baseline, the delta in resources with the new scheduler and with SurgeProtector enabled is as follows: +2.3% ALMs, +1.3% registers, +1.9% BRAM.

Resource Baseline Pigasus Scheduler+FCFS Scheduler + WSJF
Logic utilization (in ALMs) 218,147 / 703K (31%) 222,909 / 703K (32%) 223,176 / 703K (32%)
Total dedicated logic registers 456937 462695 458297
Total pins 89 / 1,112 (8%) 89 / 1,112 (8%) 89 / 1,112 (8%)
Total block memory bits 55,484,024 / 140M (40%) 56,225,000 / 140M (40%) 56,280,552 / 140M (40%)
Total RAM Blocks 3,364 / 6,847 (49%) 3,423 / 6,847 (50%) 3,428 / 6,847 (50%)

Similarly, the following table depicts the timing achieved by all three designs (based on a single Quartus run). None of the designs closed timing, but the negative slack is small. Also, note that the worst-case path in the WSJF case occurs in the string-matcher instance (unrelated to this change).

Timing Parameter Baseline Pigasus Scheduler+FCFS Scheduler + WSJF
Fmax (user_pll_inst) 216.45 MHz 210.39 MHz 210.13 MHz
Fmax (pcie) 246.43 MHz 247.4 MHz 265.18 MHz
Worst-case slack -0.067 (100C, setup) -0.072 (100C, setup) -0.205 (100C, setup)

Testing

Verified correctness for several configurations of packet buffer sizes (PKT_NUM), maximum number of out-of-order flows (MAX_NUM_OOO_FLOWS), and input traces in simulation. Running the existing testbench with the provided input trace produces the expected result. The newly-added tests (exercising individual SurgeProtector components) all pass, including a new regression test exercising the reassembler on a trace containing 10% adversarial traffic. The existing version of Pigasus produces the following error on this adversarial input (due to the memory exhaustion problem described in the Flow Garbage Collection section above):

# Finish LL emptylist init
# Finish PKT emptylist init
# PKT          0
...
# PKT       1024
# ** Error: Request from empty PKT_emptylsit

After this change, the test-case completes for both FCFS and WSJF (albeit with very different throughputs). The figure below depicts the goodput achieved by the two scheduling policies for different attack bandwidths (note that SurgeProtector corresponds to WSJF).

Note: End-to-end hardware tests are still pending. PLEASE DO NOT MERGE THIS CHANGE YET.