elastic / ebpf

Elastic's eBPF
Other
67 stars 11 forks source link

Add UDP probes in anticipation of DNS monitoring #206

Closed fearful-symmetry closed 2 months ago

fearful-symmetry commented 2 months ago

see: https://datatracker.ietf.org/doc/html/rfc1035#autoid-40

This PR adds basic support for DNS monitoring, using the ip_send_skb and skb_consume_udp functions in the kernel networking stack. The logic here is very basic, and includes no parsing of actual DNS data; after spending the better part of the day reading RFCs and looking at other implementations, I've decided that parsing actual DNS requests/responses is complex enough (and risky enough) that we should probably use an external library for parsing DNS data. It's one of those "getting it 80% working is 20% of the code" protocols.

As a result of this, this PR just creates a DNS packet that just provides basic network metadata and and a buffer for the packet data.

stanek-michal commented 2 months ago

using fentry/fexit this is (in theory) possible by reading the iov_iter struct. Where this gets complicated is pre-trampoline kernels---kretprobe doesn't have access to the function args, meaning we can't get the iovec data after it's been populated.

We could probably do the set_state/get_state trick that we have in Helpers.h which maintains state between entry and ret so that we get access to the input argument pointers in the handler for ret.

There's some alternatives here, most notably using TC probes: https://ebpf-docs.dylanreimerink.nl/linux/program-type/BPF_PROG_TYPE_SCHED_CLS/ , which I haven't experimented with yet.

I believe this is where TC runs on the RX path: https://elixir.bootlin.com/linux/v6.10.7/source/net/core/dev.c#L4049 so this is early, at L2, before deliver_skb() and before it reaches the IP layer. We probably cannot easily map it to the receiving process without having to maintain our own bookkeeping. We could experiment with bpf_sk_lookup_tcp helper and see if its reliable - if it works as advertised it can get the listening socket just from the 5tuple so that could allow us to extract the PID easier probably. There's also bpf_sk_fullsock() which does...something.

Another thing is at this early L2 stage we don't even know if the packet will actually reach the application, there is quite a lot of filtering and drop scenarios in between.

So I think I'm leaning towards the hooks as they are done in this PR if we can easily get packet data there and parse it. I assume we want to filter for port 53 in the kernel and avoid creating events for other UDP packets otherwise there could be a lot.

fearful-symmetry commented 2 months ago

@stanek-michal so, I did some tinkering over the three-day weekend and came to a lot of the conclusions you did, namely that we could use a state getter/setter for kretprobes, and also the problem with finding the userspace pid with a TC/classifier probe.

I'm leaning towards the getter/setter with kretprobe idea for this, since it's a little simpler (we don't need some other mechanism for tracking what userland process owns the pid).

fearful-symmetry commented 2 months ago

Taking this out of draft since I think this is a fairly decent framework going forward if/when we want to parse DNS packets.

haesbaert commented 2 months ago

This won't work for unconnected sockets, as the socket doesn't have an address, if you run the following program you will see the packets don't show up. Worth noting that unconnected udp sockets are likely more common than connected.

You can use the following program to verify:

./udpsend -c 8.8.8.8 # connected
./udpsend 8.8.8.8 # unconnected
#define _GNU_SOURCE         /* program_invocation_name */
#include <sys/types.h>
#include <sys/socket.h>

#include <netinet/in.h>

#include <arpa/inet.h>

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <err.h>

static void
usage(void)
{
    fprintf(stderr, "usage: %s [-c] addr4",
        program_invocation_short_name);

    exit(1);
}

int
main(int argc, char *argv[])
{
    struct sockaddr_in  sin;
    int         sock, ch, do_connect;
    ssize_t         n;
    uint64_t        buf[] = {
        0xdeadbeef, 0xdeadbeef, 0xdeadbeef, 0xdeadbeef,
        0xdeadbeef, 0xdeadbeef, 0xdeadbeef, 0xdeadbeef
    };

    do_connect = 0;
    while ((ch = getopt(argc, argv, "ch")) != -1) {
        switch (ch) {
        case 'c':
            do_connect = 1;
            break;
        default:
            usage();
        }
    }
    argc -= optind;
    argv += optind;

    bzero(&sin, sizeof(sin));

    if (inet_aton(argv[0], &sin.sin_addr) == 0)
        errx(1, "inet_aton(%s)", argv[0]);
    sin.sin_port = htons(53);
    sin.sin_family = AF_INET;

    if ((sock = socket(AF_INET, SOCK_DGRAM, 0)) == -1)
        err(1, "socket");

    if (do_connect) {
        if (connect(sock, (struct sockaddr *)&sin, sizeof(sin)) == -1)
            err(1, "connect");
        n = write(sock, buf, sizeof(buf));
        if (n == -1)
            err(1, "sendto");
        else if (n != sizeof(buf))
            errx(1, "sendto: shortcount");

        return (0);
    }

    n = sendto(sock, buf, sizeof(buf), 0,
        (struct sockaddr *)&sin, sizeof(sin));
    if (n == -1)
        err(1, "sendto");
    else if (n != sizeof(buf))
        errx(1, "sendto: shortcount");

    return (0);
}

As I mentioned to you yesterday, I think this is still not a good place to tap, I'll do a proper review later tonight and have more comments, but if you use the program above with events trace you can spot the issue.

fearful-symmetry commented 2 months ago

Alright, so, reverted the commit that swapped over to the *skb* methods. This is back to using udp_sendmsg and udp_recvmsg. Will continue to investigate alternatives.

fearful-symmetry commented 2 months ago

Alright, tested on 6.10 and 5.15...

nicholasberlin commented 2 months ago

I think we have settled on the skb functions, which means the previously excellent description needs some love.

haesbaert commented 2 months ago

I'll have a look on this tonight

fearful-symmetry commented 2 months ago

So, did a bit more testing with large DNS queries using this:

dig +noall +answer +comments @166.84.7.99 512.size.dns.netmeister.org a

Haven't noticed any weird paging behavior.

fearful-symmetry commented 2 months ago

@haesbaert totally forgot about the earlier udp send example you wrote, so I switched to that.

haesbaert commented 2 months ago

I haven't tested the last iteration but looks good to me, good work!