cilium / cilium

eBPF-based Networking, Security, and Observability
https://cilium.io
Apache License 2.0
19.69k stars 2.88k forks source link

Replace conditional compilation based on `#ifdef` with regular if statements #27320

Open dylandreimerink opened 1 year ago

dylandreimerink commented 1 year ago

Currently, Ciliums BPF code is riddled with #ifdef pre-processor macros which are used to conditionally compile certain features into the datapath. There are good reasons for doing so, it means that Cilium users who disabled the a given feature do not "pay" for any delays in the code related to checking at runtime if a feature is enabled or not. Loading less bytecode into the kernel can also be critical for complexity reasons. That way users of older kernels could pick which features they want but could not enable to many at once.

There are, downsides to this approach. It makes the code hard to understand, both due to the mixing of preprocessor and normal C statements. Another downside is that we need to ship the clang/LLVM toolchain with our application and recompile our BPF programs if feature settings change. Shipping the toolchain adds significant size to the docker image and compiling can take a while which is not idea. Lastly, dynamic compilation like this makes any attempts at signing BPF code (which is not yet a thing) practically impossible.

The idea proposes in this issue to to replace these pre-processor macros with regular C if statements which check the value of a global variable.

Going from:

void some_func() 
{
  // [...]
#ifdef IPV4_ENABLED
  return process_x_ipv4(...);
#endif

#ifdef IPV6_ENABLED
  return process_x_ipv6(...);
#endif
}

To:

void some_func() 
{
  // [...]
  if (feature_ipv4)
    return process_x_ipv4(...);

  if (feature_ipv6)
    return process_x_ipv6(...);
}

The effect of this is that we always compile all of the code available, thus detecting any compilation issues in CI or at the time of cutting a release. By setting the value of these feature_ipv4 and feature_ipv6 at runtime (in the loader) we can pick which if a branch should be taken or not. This way we can disable a branch and the verifier will not be able to reach it and thus also not count it towards the complexity.

Since kernel version v4.15 the kernel will replace dynamically dead code with NOP instructions for security reasons. The conditional branching instruction is still left in place.

In v5.1 actual dead code elimination was added for privileged programs. In a three step process, any conditional jumps that are always true are replaced by jump always instructions to avoid the patently of branch misdirection. The second step actually removes dead code and recalculates offsets in the program. And a final step detects NOPs such as jump always instructions that jump to the following line due to all dead code after it being removed and removes these NOPs. So since v5.1 there is no program cache or branch misprediction penalty for having dead code.

Since the minimum requirements of Cilium is a kernel of at least 4.19, it should be safe to start doing this. There is a risk of performance regression on pre v5.1 kernels and there is a risk of increased program complexity due to certain compiler optimizations which can't be made. We will have to monitor for these as we go.

It is not until v5.5 that we can use actual global data for code elimination. So until then we have to use a workaround. To some extent we already do, which we refer to as "static data", essentially generating global data load instructions and replacing them at load time with load IMM instructions. During prototyping, the current method seems to not work for branching code, but the following C code generates instructions with do work:

#define FEATURE(name)                                          \
    __attribute__((section(".data"), used)) void *feat_##name; \
    int has_##name()                                           \
    {                                                          \
        return &feat_##name == (void *)1 ? 1 : 0;              \
    }

FEATURE(ipv6)

The above code generates a has_ipv6() function which returns 1 or 0 usable in an not-zero comparison. We likely want to change the naming so we can use the same technique for not just boolean values but also enums ect. Once we can drop all kernel support below v5.5 we can simply replace all of this with actual global data from frozen maps.

If we are able to replace all macros with static data and/or this alternative method of global data, we should be able to just compile our BPF programs in CI and ship them in the docker images instread of the source code + toolchain. This would allow us to do ELF level code signing for now. And do BPF level code signing after v5.5.

There are yet some more challenges. Currently we use the feature pre-processor macros not only inside code bodies but also to conditionally include map defintions and to construct the tail call map such as with declare_tailcall_if and invoke_tailcall_if macros. So we will need to offload some of these features to the Go loader logic.

kwakubiney commented 1 year ago

This looks interesting, is it more like a discussion issue (for now) or someone can pick up?

dylandreimerink commented 1 year ago

@kwakubiney it is a very concrete proposal based on a lot of offline talks with other contributes. Honestly, this issue is more of an epic with multiple sub tasks than a simple issue. Feedback is of course welcome. It is likely going to be an effort from multiple people given the amount of work if we want to do this.

The first step will be to replace a single macro in the BPF code and to make the nessesery adjustments in the Go code. Then test this change very thoroughly.

After that will be a lot of PRs changing over one or two macros at a time following the example of the initial PR. This will likely take multiple people and take a while. Somewhere in there we should also figure out the map loading and tail call construction.

And then somewhere in the future we will have a clang-less image, but that might honestly take more than one release cycle.

lmb commented 1 year ago

Instead of emitting void* I think it's safer to use inline asm to generate relocation entries. That is what we settled on after a bunch of false starts at Cloudflare. IIRC it avoids a bunch of problems with various clang optimizations. This is supported by the library:

The only thing needed is a post-processing step where we look for references and rewrite those by changing the instructions directly.

dylandreimerink commented 1 year ago

I think it's safer to use inline asm to generate relocation entries

I actually started with that. But the compiler will completely ignore my wishes and will emit an additional dereference instruction which I did not include in the asm block. The goal is to get a singe instruction that does LoadMapValue dst: rx, fd: 0 off: x <.data> and has the correct relocation entry.

So far this is the only combination I have been able to find that does exactly this.

__attribute__((section(".data"), used)) void *feat_##name;
return &feat_##name

The example you sent looks interesting, it creates references to non existing symbols, not to global data:

; asm("%0 = HAS_IPV6 ll"
1: LdImmDW dst: r1 imm: 0 <HAS_IPV6>

So this might work, but its not compatible with the current global data inlineing or frozen maps if we want to rely on native kernel support in the future.

lmb commented 1 year ago

But the compiler will completely ignore my wishes and will emit an additional dereference instruction which I did not include in the asm block.

Ah yes, that was the problem. And that dereference can also be "far" away from the load, they're not necessarily consecutive.

its not compatible with the current global data inlineing or frozen maps if we want to rely on native kernel support in the future.

That's correct, but then neither is the approach with looking at the "pointer" value? Plus, freezing etc. only happens for .rodata.

I'd probaby approach it in a way where for now we'd use the relocation hack and then swap it for real constants once we target a kernel that does support it.

lmb commented 6 months ago

I'll start working on this mid March and hope to have an idea whether we can do this automatically end of march.

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.