LLVM 13.0.1 required for latest Linux

pchaigno commented 6 months ago

It looks like we will need to support at least LLVM 13.0.1 to be able to run Agni against the latest Linux (here, Linus's tree).

Extract compile flags for current kernel version
make CC=/usr/bin/clang olddefconfig
Traceback (most recent call last):
  File "/home/runner/work/agni/agni/llvm-to-smt/generate_encodings.py", line 610, in <module>
    cmdout_make_config = subprocess.run(cmd_make_olddefconfig,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['make', 'CC=/usr/bin/clang', 'olddefconfig']' returned non-zero exit status 2.

Trying to run that make command:

***
*** C compiler is too old.
***   Your Clang version:    12.0.1
***   Minimum Clang version: 13.0.1
***
scripts/Kconfig.include:44: Sorry, this C compiler is not supported.

Full run can be seen at https://github.com/pchaigno/agni/actions/runs/8772758147/job/24072064645?pr=3.

harishankarv commented 5 months ago

@pchaigno Do we know where this clang version requirement is enforced? Can this requirement be perhaps be overriden (somewhere in the kernel code) for the sake of CI?

shunghsiyu commented 5 months ago

The version requirement is added with the "Bump the minimum supported version of LLVM to 13.0.1" series since v6.9.

Seems like the main rational is to address the issue where LLVM generate a libcall to __muloti4 for __builtin_mul_overflow(), which doesn't sound like something that affects Agni AFAICT. Maybe have these commit reverted in CI?

9c1b86f8ce04 kbuild: raise the minimum supported version of LLVM to 13.0.1
c6d9a4a93725 Makefile: drop warn-stack-size plugin opt
22d3da073f33 x86: drop stack-alignment plugin opt
19336376bda6 ARM: remove Thumb2 __builtin_thread_pointer workaround for Clang
634e4ff9ffd8 arm64: Kconfig: clean up tautological LLVM version checks
9a12e9a165b3 powerpc: Kconfig: remove tautology in CONFIG_COMPAT
de5f3984664e riscv: remove MCOUNT_NAME workaround
a38d97181271 riscv: Kconfig: remove version dependency from CONFIG_CLANG_SUPPORTS_DYNAMIC_FTRACE
7d354f49b8d6 fortify: drop Clang version check for 12.0.1 or newer
9feceff1d2d6 lib/Kconfig.debug: update Clang version check in CONFIG_KCOV
e5efd80a9a76 compiler-clang.h: update __diag_clang() macros for minimum version bump

Commit 4-8 could be ignored if only x86_64 is used, and patch 9 could be ignored as well since LLVM 12 is being used here.

pchaigno commented 5 months ago

Thanks @shunghsiyu!

The LLVM version check can likely be bypassed, but I'd much prefer supporting newer LLVM versions. It won't look great going upstream if the first thing we need to do is revert a bunch of kennel commits 😞

I started working on supporting LLVM 18, but jumping straight to that version may be a bit challenging.

harishankarv commented 5 months ago

Thanks @shunghsiyu!

It won't look great going upstream

@pchaigno Got it.

I started working on supporting LLVM 18

Since LLVM doesn't promise backward compatibility across versions, supporting newest versions (e.g. 18) will be difficult. LLVM 15 for instance had a major change that changed all pointers (typed pointers were replaced opaque pointers). I feel starting with trying to support LLVM 13 will be easier.

pchaigno commented 4 months ago

Since LLVM doesn't promise backward compatibility across versions, supporting newest versions (e.g. 18) will be difficult. LLVM 15 for instance had a major change that changed all pointers (typed pointers were replaced opaque pointers).

Yep, that's exactly the issue I hit when trying to support LLVM 18.

I switched to LLVM 14 as a target instead. https://github.com/bpfverif/agni/pull/29 implements that and a couple fixes needed. I left it in draft because it's still failing on v6.7 for BPF_LSH_32 and BPF_ARSH_32, so I may need some help. The error is:

[handleStoreToPhiPtr] phiArgBBI: if.else.i.i1070.i
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at

harishankarv commented 4 months ago

Thank you for #29 @pchaigno!

I tried using llvm-14, and encountered the same failures. I have a temporary fix below.

run_llvm_passes.py was updated in d52bff5 to use Oz passes (that optimize for code size), instead of O1 passes. This worked in llvm-12, but not anymore when updating to llvm-14. Ostensibly, the passes bundled in "Oz" have changed in llvm-14.

The following is a patch that reverts back to using O1.

0001-llvm-to-smt-Revert-back-to-using-O1-passes.patch

I checked that this was able to generate all encodings for 5.9 and 6.7. Notably, this will increase code size (both llvm and consequently, smt), and also llvm-to-smt runtime. However, the smt encodings are still correct. And with modular encodings, this smt code size might be less of an issue.

This can be a temporary "fix" until I am able to figure out what exactly in llvm-14 fails when using Oz.

harishankarv commented 3 months ago

Confirming that the patch from above (i.e. reverting back to O1 builds) also works on 6.9. This needs another small patch in the verifier (snippet below).

The generate_encodings.py script "fixed" the function __mark_reg_unknown by deleting memset instructions which llvm-to-smt does not handle. In the newer kernels, __mark_reg_unknown further calls __mark_reg_imprecise, where the actual memset happens. So the memset delete logic in generate_encodings.py fails. This patch essentially inlines __mark_reg_imprecise into __mark_reg_unknown so that generate_encodings.py is able to handle it.

0001-Remove-mark_reg_unknown_imprecise.patch

Note that there are many ways to address this (some noted below), but a patch is the quickest way.

Handle memset in llvm-to-smt in a special way, ~by setting the bpf_reg_state fields to 0~.
Write an llvm pass that simply deletes memset instructions that are descendants of __mark_reg_unknown. We only care about tnum/u/s/min/max ~which are (and should be always) overwritten in mark_reg_unbounded anyway~ which are updated by mark_reg_unbounded anyway.

Also, noting that commit 67420501e8 introduced a backward-edge in the verifer. This was addressed in 4c2a26fc80. So any kernels between 67420501e8 and 4c2a26fc80 should be patched if we llvm-to-smt to handle them.

pchaigno commented 3 months ago

Many thanks! I've pushed your commit and a patch to "fix" __mark_reg_unknown_imprecise to https://github.com/bpfverif/agni/pull/29. I'm however still getting the following error when I run it against v6.9:

[handleLoadFromGEPPtrDerivedFromPhi] GEPMapIndices: [ 6 ]
[handleLoadFromGEPPtrDerivedFromPhi] phiConditionBoolI: sw.bb436.i.i_again.i.i_25_110
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at

It does work against v6.7 (without the last commit, which I haven't generalized to all kernel versions yet).

Also, noting that commit 67420501e8 introduced a backward-edge in the verifer. This was addressed in 4c2a26fc80. So any kernels between 67420501e8 and 4c2a26fc80 should be patched if we llvm-to-smt to handle them.

:thinking: v6.9 is between 67420501e8 (v6.8-rc1~131^2~289^2~6^2~12) and 4c2a26fc80 (v6.10-rc1~153^2~432^2~8). Shouldn't it fail?

Note that there are many ways to address this (some noted below), but a patch is the quickest way.

Can you remind me the issue with memset? Is it that we need to have per-field accesses because the SMT are then written against specific field?

harishankarv commented 3 months ago

🤔 v6.9 is between 67420501e8 (v6.8-rc1131^2289^26^212) and 4c2a26fc80 (v6.10-rc1153^2432^2~8). Shouldn't it fail?

That's correct! It should fail, unless the kernel is patched with 4c2a26fc80. I patched the 6.9 kernel before running llvm-to-smt, and it succeeded in generating all encodings locally.

harishankarv commented 3 months ago

Can you remind me the issue with memset?

If there are memsets in the verifier.c code, such as the one in __mark_reg_unknown:

memset(reg, 0, offsetof(struct bpf_reg_state, var_off));

clang emits an llvm.memset intrinsic in the llvm code. Typically this shows up as a bitcast to an i8* followed by a memset in llvm IR:

  %i.i.i2.i = bitcast %struct.bpf_reg_state* %dst_reg to i8*
  call void @llvm.memset.p0i8.i64(i8* %i.i.i2.i, i8 0, i64 24, i1 false) #2

We do not support this construct (bitcast, memset) in llvm-to-smt yet: it should error out with the message:

[handleCallInst] bitcast instruction not supported

As such, the above memset clears everything until the var_off field in a bpf_reg_state. So we do not care about it, we only care about tnum, u/s/32/min/max. So the approach has been to "comment out" this memset from verifier.c before running the LLVMToSMT pass.

pchaigno commented 3 months ago

🤔 v6.9 is between 67420501e8 (v6.8-rc1131^2289^26^212) and 4c2a26fc80 (v6.10-rc1153^2432^2~8). Shouldn't it fail?

That's correct! It should fail, unless the kernel is patched with 4c2a26fc80. I patched the 6.9 kernel before running llvm-to-smt, and it succeeded in generating all encodings locally.

I was able to generate the encodings (not modular) for v6.8, v6.9, and bpf-next (b3470da314fd), so I think we're looking good. Once https://github.com/bpfverif/agni/pull/29 is merged, I'll add encoding generation for bpf-next and linus to the CI.

bpfverif / agni

LLVM 13.0.1 required for latest Linux #26