elastic / ebpf

Elastic's eBPF
Other
67 stars 11 forks source link

Support for DNS monitoring #203

Open fearful-symmetry opened 2 months ago

fearful-symmetry commented 2 months ago

Impact

High

Epic/Meta Issue

No response

Planned Version

None

Description

So, we want Linux DNS support in endpoint, and part of that is going to be done here in ebpf. I'm making a lot of assumptions here, so feel free to correct me if there's something I'm missing. This is just a preliminary list of all the parts we'll need for DNS monitoring in this repo:

haesbaert commented 2 months ago

DNS over UDP. Can be done via some combination of ip[4,6]_datagram_connect, udp_destruct_sock and others. Similar enough to existing network probes. The remaining question: do we want the probe to filter by port 53 here in ebpf, or should upstream components in endpoint do that?

Parsing DNS is quite tricky, I've implemented rfc 1035,3845,6762,6763 (the pain is real). Getting the parser right is a quite a lot of work. Pure DNS (not MDNS) requests and responses tend to be quite small, like < 1KB, it's likely more profitable to divert the packet to userland and do the parsing there, also less maintainance, meaning it could even be done with old school BPF/AF_PACKET.

DNS over TCP. Also uses port 53. Do we want to support this?

Likely, I'm unsure about how common it is out on the wild, on the endpoint I'd be surprised if anyone is using TCP, the recursive resolver on the other side might. I think it's safe to say the first version should not bother with TCP.

DNS over TLS. Uses port 853. Do we want to support his?

I don't think we can make very good case of this, the encryption is in userland, so we can't really parse anything, I'm unsure if stub resolvers like the ones in glibc support it, I'm guessing it's uncommon enough. I can ask around, I have a good friend who is writing DNS stuff for a decade now.

DNS over HTTPS.

Same as TLS. Worth noting that getting the TCP part right is even worse because we have to buffer the stream, instead of SOCK_DGRAM where one request and reply is always contained in a single packet.

Deep packet inspection. Do we care about people running DNS queries over non-standard ports? Should we start sniffing traffic over other ports looking for anything that looks like a DNS request/response?

I'd say no, that would involve parsing every packet in the system, and it would be a waste as DNS on non standard ports is virtually non-existant.

edit: I wrote all above assuming we don't care about Zone Transfers and whatnot (where TCP/DNSSEC is more the norm), just client requests/replies.

haesbaert commented 2 months ago

After consulting some people, TCP/DNSSEC it's likely something we can ignore, it's not something that normal endpoint users do, maybe a security enthusiast configures on his machine and whatnot, but I think we can ignore it.

DNS@HTTPS has some momentum as firefox supports it (off by default), I doubt we could properly collect the data we want, since all we can see is the CONNECT like on the https body. I'm assuming we don't want to enter the world of uprobes (I don't).

haesbaert commented 2 months ago

Another point worth considering is caching local resolvers. If the caching resolver listens to localhost:53 we're gold, but if the caching resolver can be accessed via dbus (like systemd-resolverd) or a named pipe (AF_UNIX socket), then we won't see the requests of the client hitting the local caching resolver, we would only see the outgoing request from "caching resolver->interwebs", which would be fine, the problem is it caches the reply, so further requests from the client applications would not trigger an outgoing packet, meaning we wouldn't see it.

fearful-symmetry commented 2 months ago

@haesbaert yeah, I assume we're going need to care about how the local cache is set up, but unless we want to use uprobes or something for that, I assume that will fall outside the scope of changes to this repo?

haesbaert commented 2 months ago

@haesbaert yeah, I assume we're going need to care about how the local cache is set up, but unless we want to use uprobes or something for that, I assume that will fall outside the scope of changes to this repo?

Possibly, though if it's just doing DNS over AF_UNIX we could likely add it, but I suspect glibc@nss@systemd-resolver is actually talking through dbus and thus we can't do much.