Open Arignir opened 1 year ago
Thanks for the detailed write-up! Would you be willing to contribute a fix? We've recently improved support for testing on multiple kernel versions, so if you write a test you'll be able to ensure it works (at least locally; not all the tests currently pass on older kernels so we only test on 6.1 in CI). cargo xtask integration-test vm --help
should get you started.
A friend of mine, @roblabla, is willing to contribute a fix. We are using a workaround so it's low priority for us, not sure when the fix will land.
Hello,
I work with CO-RE C eBPFs using Aya on older systems and I came across an interesting bug.
While minimizing
vmlinux.h
to the very strict minimum necessary for my eBPFs, my eBPFs failed to load with anArgument too long
(E2BIG
) error on Debian 10 (Linux 4.19.0-24-amd64), and on this distribution/kernel only (everything was ok on Ubuntu 18 or anything more modern)If you look at the source code of the bpf syscall, it checks its arguments early with
bpf_check_uarg_tail_zero()
. If theunion bpf_attr
given to the BPF syscall is bigger than the one known to the kernel and the extra-bytes aren't 0,-E2BIG
is returned. This is to prevent userspace from relying on features that aren't yet provided by the kernel.When looking at Aya's source code, I realized that this is exactly the problem I am facing.
On one side, Aya evaluates which features are available here. Especially, it assumes that BTF are available if the call to
bpf(BPF_BTF_LOAD)
succeeded. When loading the eBPF, it will fill theprog_btf_fd
field of thestruct attr
if that check returned true.And here's the issue: on Linux 4.19,
bpf(BPF_BTF_LOAD)
was already implemented butprog_btf_fd
wasn't inunion bpf_attr
yet. Therefore, when loading the eBPF, aya first checks if BTF are supported, then loads the BTF, and finally setsprog_btf_fd
to the returned file descriptor (see here). This is wrong, and where the bug originates.A quick and easy fix would be to strengthen the check for BTF availability by also checking for
bpf(BPF_PROG_LOAD)
on top of the existingbpf(BPF_BTF_LOAD)
, but that doesn't fix the underlying issue.Instead, aya should be much more strict about which fields of
struct btf_attr
it relies on. Especially, it shouldn't assume that if a field is available for one operation then another related field should also be available for another operation, nor should it assume that if a feature is detected/available then all related fields are also available unless they have all been tested. Note that as stated earlier, modern fields on older kernels aren't ignored, they make the syscall fail.NOTE: I am working with a version a bit older than the main branch, but by looking at the source code it seems the bug is still there. If I'm wrong, I'm very sorry :D