aya-rs / aya

Aya is an eBPF library for the Rust programming language, built with a focus on developer experience and operability.
https://aya-rs.dev/book/
Apache License 2.0
3.25k stars 290 forks source link

sock_ops: program should be able to return a negative value #1019

Closed l2dy closed 2 months ago

l2dy commented 2 months ago

https://github.com/aya-rs/aya/blob/40f303205f7a800877fe3f9a4fb1893141741e13/aya-ebpf-macros/src/sock_ops.rs#L26

The sock_ops proc macro generates a function that returns u32, but the commit[1] that added BPF support for sock_ops to Linux says (emphasis mine):

Currently there are two types of ops. The first type expects the BPF program to return a value which is then used by the caller (or a negative value to indicate the operation is not supported). The second type expects state changes to be done by the BPF program, for example through a setsockopt BPF helper function, and they ignore the return value.

Should we change the return type to i32?

alessandrod commented 2 months ago

Sounds like we should! Have you double checked in the kernel that callers still handle < 0 return values?

l2dy commented 2 months ago

In the original commit, the return value was stored in int ret and compared against 1. But as of https://github.com/torvalds/linux/blob/ad618736883b8970f66af799e34007475fe33a68/kernel/bpf/cgroup.c#L31 the function was refactored to call bpf_prog_run_array_cg(.., 0, NULL), which saves result of run_prog(prog, ctx) to u32 func_ret and returns -EPERM if !func_ret is true, and 0 (success) if otherwise.

If I understood the code correctly, the function only returns -EPERM if a BPF program returns 0. Negative numbers (actually positive when fitted into a u32) are considered a success. But this contradicts with the comment on __cgroup_bpf_run_filter_sock_ops(), so I'm not very sure about my findings.

 * This function will return %-EPERM if any if an attached program was found
 * and if it returned != 1 during execution. In all other cases, 0 is returned.

I'm also unsure if there are other entrypoints for sock_ops programs besides __cgroup_bpf_run_filter_sock_ops(), but if there isn't, u32 is the correct return type to use for sock_ops programs, because only 0 is a special case and other numbers are all treated the same.