[Question] Loop, BPF MAP and xdp-filter

Disclaimer: I'm very new to eBPF and XDP and know very little about C.

Hello sir!

I'm willig to make a simple stateless "in the middle firewall" (I forgot the formal term :( ) using XDP, and was looking your code and also xdp-filter example¹.

I don't get why you are using loop² to look for the filters (and than having the 100 rules limit). Shouldn't you create a BPF MAP with the filters and use the lookup to find the rule in it like in the xdp-filter example³ and than get rid of the loop?

I have some requirements for this firewall I'm willing to make and would like your opinion about it (if you can, of course!). I was intended to extend the xdp-filter example for this, but your code looks so simple and already have many features that I wanted. Maybe I can try to help you extending your solution, but I think that xdp-filter code looks more structured and more easy to extend.

What do you think?

Simple 'diagram': INSIDE NET <-> XDP FW <-> EXTERNAL NET

Current requirements:

-> Must match IP/Mask ranges/subnets (eg.: 192.168.0.0/24).
-> Must support IPv4, IPv6, ICMPv4, TCP and UDP.
-> Must support classic tuples (source ip, destination ip, source port, destination port, protocol). "any" keyword matches for anything.
    -> for ICMP: source ip, destination ip, code, type.
-> Must have permit/deny keyword on each rule.
-> Must match TCP flags.
-> Must have a "established" keyword: Accept TCP segments with ONLY ACK *OR* RST flags set. (only matters on the "API")
-> Must load rules from JSON file.
-> Must specify in (internal) and out (external) interfaces. (Routed Traffic, without full router capability).
    -> Must specify destination MAC Address for nexthop. (or get automated based on IP/ARP). - less processing on XDP program (does not need to have specific kernel lookup functions, just change the dst MAC)
    -> Will not have multiple destinations, specific routings or LB.

-> May have an API to write JSON rules file and reload rules.
-> May have a simple HTML interface talking to the API.
-> May have mutiple maps for each protocol (IP, UDP, TCP, ICMP, etc.) to enhance lookup performace. Is it doable?
-> (doesn't matter - just a note) May not have an IP address at in/out interfaces. XDP only cares about received packet on wire. Doesn't matter IP on interface neither MAC address.

Future: -> NAT: 1-p-1 and n-p-1(PAT). Can be a separate program that can also be run on a different machine for more performance. -> CONNTRACK or similar map before inspecting list of rules.

Thanks!

1- https://github.com/xdp-project/xdp-tools/tree/master/xdp-filter 2- https://github.com/gamemann/XDP-Firewall/blob/master/src/xdpfw_kern.c#L330 3- https://github.com/xdp-project/xdp-tools/blob/master/xdp-filter/xdpfilt_prog.h#L67

Hey @liviozanol, I apologize for the delay! Things have been hectic recently with the holidays!

Getting rid of the for loop and relying on BPF map lookups would likely be faster than the for loop in my opinion, but it is not possible with the amount of customization I have unless if I am inserting entries for every possible combination in the filter rules. The reason for this is the BPF hash map's key does not support wildcards, understandably so. With that said, the lookup key has to be exact so things like min_ttl and max_ttl wouldn't work unless if you inserted separate entries into the map from min_ttl to max_ttl as the key if that makes sense and then in combination with every other possible filter. It's the same things with all the options we have with the IP header and layer 4 headers. Unless if you want to be inserting a ridiculous amount of entries (depending on the configuration of course), the best option was to use for loop. I hope this makes sense!

For supporting IP ranges, you will want to use a LPM (Longest Prefix Match) BPF map (map type BPF_MAP_TYPE_LPM_TRIE). The following should help with implementing this type of map.

I hope the above helps regarding LPM maps and if not, I can try to find better examples explaining the code or make my own example with comments, but this will take more time (I've used it before thankfully, but I'm pretty busy for the holidays right now).

As for the tuples, you would likely have to insert each possible case into a BPF map unless if you want to make one of those fields mandatory (then in that case, you can look up the mandatory field and then in the value, have check_dstport and dstport fields for example to match other parts of the tuple if need to be). The below is a basic structure as well for a tuple in C.

#include <linux/types.h>

struct tuple
{
    __u32 srcip;
    __u16 srcport;
    __u32 dstip;
    __u16 dstport;
};

One thing to note is you can't check if a connection is established via TCP in XDP because this is the first hook in the Linux path (assuming the driver supports XDP DRV mode). This is before the SKBs are allocated and there is no state information for TCP connections. You will need to manually keep track of states in your XDP code. With that said, I'd highly recommend having SYN cookies as well to prevent a majority of SYN floods that would over-saturate the TCP state table. This is a system I implemented for my job, but I'm not able to release the source code. Something that may help is the following repository that I really like.

https://github.com/PlushBeaver/xdp-syn-cookie

While this code does perform basic SYN cookie validation, if you want a fully working TCP-like proxy with SYN cookies in XDP, you will need to do a lot more than that and that's when it gets complicated in my opinion. You need to keep track of the TCP sequence/acknowledgement numbers along with TCP timestamps (if provided in the additional TCP headers) on two connections (one from the client to the filter machine and another to the filter machine to the actual destination service). For locating the TCP timestamps options, I recommend the following code from a repository I made since retrieving the TCP timestamps options is a pain due to the TCP additional header options being dynamic in size.

https://github.com/gamemann/XDP-TCP-Header-Options

Additionally, for SYN cookie validation, there are also BPF helpers that you can use, but I personally haven't used them before. The BPF helpers can be found here (I'd recommend searching for "syn").

https://man7.org/linux/man-pages/man7/bpf-helpers.7.html

As for the JSON parsing, you can use libbpf to update the BPF maps within the user space. LibBPF also has support/bindings for more higher-up languages such as Python and Golang (which are much easier to use for parsing JSON than libraries in C in my opinion). However, my problem with it is even with pinning the maps to the file system, the libbpf bindings for those languages doesn't appear to support retrieving BPF maps from the file system after pinning. I made GitHub issues for both languages below.

Golang - https://github.com/iovisor/gobpf/issues/301
Python - https://github.com/iovisor/bcc/issues/3517

Unfortunately, I didn't receive any supportive response for those issues and it has been months now :( I am still trying to find a work around and I think that is writing the BPF map's FDs to the file system somewhere and loading them through those programs. I am yet to test this, but once I do, I can let you know how it goes if you want!

Otherwise, you can use C with the json-c library here. If you want some useful code for json-c, I actually tried remaking the XDP Firewall a long time ago with JSON support along with addition changes and called it "Barricade Firewall". This project was discontinued, but you will probably find the JSON parsing code helpful if you plan on using C and JSON. The code may be found below.

https://github.com/Barricade-FW/Firewall/blob/master/src/config.c

As for the separate BPF maps, if you use a hash map, it should be a fast lookup regardless of entry count I believe. However, I haven't actually tested that. I usually just have it in one map with the keys being the destination IP/port and protocol. However, having separate maps for each protocol isn't a bad idea either!

I also wanted to mention something else. This isn't XDP-related, but another network packet processing library that is very popular is the DPDK. This is a kernel bypass library, but it is very complicated to learn in my opinion (e.g. look at this documentation). I really love XDP, but I've ran into many limitations with it and the BPF verifier can be a pain. The DPDK bypasses the kernel and sends packets from the NIC to the user space so you have full user space functionality. Technically the DPDK is faster than XDP according to benchmarks I've seen and also busy-polling allows for lower latency of packets. Recently, I started digging deep into it and made two repositories. One is for examples/tests I made and the other is a library that is aimed to simplify the setup of the DPDK applications for general purpose use (e.g. packet processing, packet generation, and so on). You may find these helpful if you ever consider learning the DPDK and using that over XDP.

Both XDP and the DPDK are great solutions. However, I just find after the learning curve of the DPDK, that it is easier to program in due to user space functionality. If you're implementing complicated functionality into XDP, you will likely run into some BPF verifier limitations at some point at least (there are usually ways around them, but still). I do understand you're new to C as well, so sticking with XDP is probably best for now, but just wanted to bring up the DPDK for the future if you start digging into C.

I also want to just make clear that I don't consider myself an experienced expert in C. I've been learning it for a year and a half now, but it's not like I have many years of experience. The stuff I'm making works great in my opinion, but I'm sure things can be heavily improved and I'm hoping to do so in the future.

I hope the above helps and I apologize if the post is too long! I'm just trying to help as much as I can because I know how complicated some of this can be. If you have any questions, feel free to let me know :)

Hello @gamemann!

Thank you very much for your reply with such valuable information!

About using LPM, I think they are more suitable for routing or forwarding decision. I can be wrong of course, but I think that using LPM for firewall rules would be difficult to implement and even worse for users to understand. An ordened list (like your for loop) is much more intuitive.

I was thinking in one (or more) table to match the rules top->down for the first packet (like your loop also, but using map and something like https://lwn.net/Articles/826058/ to iterate over each element) and another map as a "fastpath" for "established" connections with a one exact hit. I was also looking for some hash functions to use for the keys on tables (converting the tuples) and also reading how netfilter does it (eg.: https://www.kfki.hu/~kadlec/sw/netfilter/ct3/)

About DPDK, I've read about it and tested it a little bit, but I think that with the evolution of XDP, DPDK would be retired. (yes, it's a bet!) Also I think its too over complicated. Netgate has done some work on firewall and router on DPDK: https://www.tnsr.com/.

Backing to XDP world, I was reading/watching the presentation for "Ptables" on Netdev 0x15 conference, which seems to be closer to what I'm looking for. They have some very interesting material, but I don't know if it will be open sourced. Sent an e-mail to Jamal, but got no answer. https://www.youtube.com/watch?v=CQneYEfHKBE https://www.netdevconf.org/0x15/session.html?Introducing-Ptables

For now, I think I'll put this subject on hold and wait a little bit to move forward.

Thanks again for your help!

Hey @liviozanol!

You're welcome! Just glad I could help a little bit at least :)

As for the LPM maps, yes I would not use them for storing malicious attack traffic detected by the XDP program in real-time. However, they would be useful for pre-defined blacklist maps such as if you want to block a certain range at all times.

I'd definitely use a LRU hash map for blocking malicious traffic detected by the XDP program in real-time. I'd also use the source and destination IPs as the key. You could also use the source/destination ports if the packets are TCP/UDP, but I don't really think that is worth it since the attackers could just use all different kinds of ports. The only upside I would see is it preventing more false-positives in the case the attacker spoofs their source IP as legitimate client IPs.

With that said, I know some switches and NICs have functionality to send packets from the same flow (e.g. source/destination IP/port) to the same RX queue each time and assuming each RX queue is mapped to an individual CPU, you could then use per CPU maps reliably which would further increase performance due to all the data being accessed within the CPU's cache.

One thing to keep note of with the link you sent (https://lwn.net/Articles/826058/).

Both these approaches need to copy data from kernel to user space in order to do inspection.

I believe these are user-space functions. Therefore, you won't be able to inspect the packet in real-time to my understanding. I've used the functionality like this when iterating through a BPF map in the user space.

int bpf_map_get_next_key_and_delete(int fd, const void *key, void *next_key, int *delete)
{
    int res = bpf_map_get_next_key(fd, key, next_key);

    if (*delete) 
    {
        bpf_map_delete_elem(fd, key);
        *delete = 0;
    }

    return res;
}

Also, I believe the BPF hash maps automatically handle the hashing. Therefore, you shouldn't need to use external hashing libraries/functions unless if you aren't planning to use XDP.

As for the DPDK, I definitely agree it is complicated. I've learned a lot more recently and made a new project called Packet Batch that acts as a pen-test/DoS tool.

https://github.com/Packet-Batch

I have a standard version that uses AF_PACKETv3 sockets along with special versions that uses AF_XDP (for TX only obviously) and the DPDK. You may be interested in reading the code from there!

It's hard to say whether the DPDK would be retired or not. Both XDP and the DPDK definitely have their pros and cons. Though, I do have to admit XDP is definitely easier to learn and I feel a majority of new network programmers will learn that over the DPDK. XDP is also getting a lot more popular.

I'll watch that video and article you linked about PTables when I have the time :D

No problem again! I'm here if you need any more help!

gamemann / XDP-Firewall

[Question] Loop, BPF MAP and xdp-filter #12