falcosecurity / libs

libsinsp, libscap, the kernel module driver, and the eBPF driver sources
https://falcosecurity.github.io/libs/
Apache License 2.0
227 stars 162 forks source link

modern-bpf probe loading fails on ppc64le RHEL 8.6 #1804

Closed mdafsanhossain closed 5 months ago

mdafsanhossain commented 5 months ago

Describe the bug

When scap-open is executed on ppc64le RHEL 8.6 machine (kernel version: 4.18.0-372.9.1.el8.ppc64le), modern-bpf loading fails with


[SCAP-OPEN]: Hello!

--------------------------- SCAP SOURCE --------------------------
* Modern BPF probe, 1 ring buffer every 1 CPUs
------------------------------------------------------------------

------------------------- CONFIGURATIONS -------------------------
* Print single event type: -1 (`-1` means no event to print).
* Run until '18446744073709551615' events are catched.
------------------------------------------------------------------

---------------------- INTERESTING SYSCALLS ----------------------
* All sc codes are enabled!
------------------------------------------------------------------

libbpf: prog 'sys_enter': BPF program load failed: Invalid argument
libbpf: prog 'sys_enter': -- BEGIN PROG LOAD LOG --
R1 type=ctx expected=fp
; int BPF_PROG(sys_enter,
0: (bf) r6 = r1
; int BPF_PROG(sys_enter,
1: (79) r7 = *(u64 *)(r6 +8)
2: (79) r8 = *(u64 *)(r6 +0)
func 'sys_enter' arg0 has btf_id 123 type STRUCT 'pt_regs'
3: (b7) r1 = 0
; uint32_t status = 0;
4: (63) *(u32 *)(r10 -12) = r1
last_idx 4 first_idx 0
regs=2 stack=0 before 3: (b7) r1 = 0
5: (18) r1 = 0x1
; && (bpf_core_enum_value(enum bpf_func_id, BPF_FUNC_get_current_task_btf) == BPF_FUNC_get_current_task_btf))
7: (15) if r1 == 0x0 goto pc+5
last_idx 7 first_idx 0
regs=2 stack=0 before 5: (18) r1 = 0x1
8: (18) r1 = 0x9e
; if(bpf_core_enum_value_exists(enum bpf_func_id, BPF_FUNC_get_current_task_btf)
10: (55) if r1 != 0x9e goto pc+2
last_idx 10 first_idx 0
regs=2 stack=0 before 8: (18) r1 = 0x9e
; return (struct task_struct *)bpf_get_current_task_btf();
11: (85) call bpf_get_current_task_btf#158
12: (05) goto pc+1
; return (struct task_struct *)bpf_get_current_task();
14: (18) r1 = 0x1
; READ_TASK_FIELD_INTO(&status, task, thread_info.flags);
16: (15) if r1 == 0x0 goto pc+5
last_idx 16 first_idx 12
regs=2 stack=0 before 14: (18) r1 = 0x1
17: (18) r1 = 0x9e
; READ_TASK_FIELD_INTO(&status, task, thread_info.flags);
19: (55) if r1 != 0x9e goto pc+2
last_idx 19 first_idx 12
regs=2 stack=0 before 17: (18) r1 = 0x9e
; READ_TASK_FIELD_INTO(&status, task, thread_info.flags);
20: <invalid CO-RE relocation>
failed to resolve CO-RE relocation <byte_off> [58] struct task_struct.thread_info.flags (0:0:7 @ offset 128)
processed 16 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 0
-- END PROG LOAD LOG --
libbpf: prog 'sys_enter': failed to load: -22
libbpf: failed to load object 'bpf_probe'
libbpf: failed to load BPF skeleton 'bpf_probe': -22
 (22)
libpman: failed to load BPF object (errno: 22 | message: Invalid argument)

Same thing happens with RHEL 8.8 4.18.0-477.10.1.el8_8.ppc64le but adding guards to check for CONFIG_THREAD_INFO_IN_TASK resolves it and probe loads successfully.

https://github.com/stackrox/falcosecurity-libs/commit/33f4f0b67e7ec1817db806dadb38605e3d9872e2

How to reproduce it

On a ppc64le RHEL 8.6 machine, after compiling scap with modern-bpf

sudo ./libscap/examples/01-open/scap-open --modern_bpf

Expected behaviour

Probe loads successfully.

Environment

Additional context

Not sure if relevant, vmlinux.h was generated on a fedora 35 ppc64le vm.

/cc @Stringy

Appreciate inputs to determine what's going on here.

FedeDP commented 5 months ago

Hi! I just made a similar fix for old eBPF probe: https://github.com/falcosecurity/libs/pull/1794/files. I also increased the reccommented minimum kernel release version to 5.1 for old eBPF on ppc64.

I think the same fix (https://github.com/falcosecurity/libs/pull/1794/files#diff-93bba54bcdb415b3c7d9062c3a452b2dfd8518c230c2fcd7acfcfb81a3d32331R100) could also be ported to modern. Note: kmod should not be affected since it uses kernel API task_thread_info to retrieve the info.

FedeDP commented 5 months ago

Also, please note that this patch: https://github.com/stackrox/falcosecurity-libs/commit/33f4f0b67e7ec1817db806dadb38605e3d9872e2 is a bit wrong since you are testing against CONFIG_THREAD_INFO_IN_TASK of the machine where the modern probe gets built, but with CORE, we should instead check against the machine where we run. We will try to address this upstream asap :)

mdafsanhossain commented 5 months ago

Thanks, will wait for the patch. Curiosity, how do we check against the machine where we run?

FedeDP commented 5 months ago

We can leverage libbpf core support, like eg: we do here: https://github.com/falcosecurity/libs/blob/master/driver/modern_bpf/helpers/extract/extract_from_kernel.h#L417

FedeDP commented 5 months ago

Here it is: https://github.com/falcosecurity/libs/pull/1806

mdafsanhossain commented 5 months ago

Thanks @FedeDP and @Andreagit97. However, I ran a quick test on rhel 8.6 and it fails with the following

sudo ./libscap/examples/01-open/scap-open --modern_bpf

[SCAP-OPEN]: Hello!

--------------------------- SCAP SOURCE --------------------------
* Modern BPF probe, 1 ring buffer every 1 CPUs
------------------------------------------------------------------

------------------------- CONFIGURATIONS -------------------------
* Print single event type: -1 (`-1` means no event to print).
* Run until '18446744073709551615' events are catched.
------------------------------------------------------------------

---------------------- INTERESTING SYSCALLS ----------------------
* All sc codes are enabled!
------------------------------------------------------------------

libbpf: prog 'bind_e': BPF program load failed: ERROR: strerror_r(524)=22
libbpf: prog 'bind_e': -- BEGIN PROG LOAD LOG --
processed 170 insns (limit 1000000) max_states_per_insn 0 total_states 10 peak_states 10 mark_read 6
-- END PROG LOAD LOG --
libbpf: prog 'bind_e': failed to load: -524
libbpf: failed to load object 'bpf_probe'
libbpf: failed to load BPF skeleton 'bpf_probe': -524
 (524)
libpman: failed to load BPF object (errno: 524 | message: Unknown error 524)

What I did:

cmake \
-DUSE_BUNDLED_DEPS=ON \
-DBUILD_LIBSCAP_MODERN_BPF=ON \
-DMODERN_BPF_DEBUG_MODE=ON ../;

make scap-open
sudo ./libscap/examples/01-open/scap-open --modern_bpf
Andreagit97 commented 5 months ago

errrno 524 means not supported so probably your machine doesn't support some helpers that are necessary for the modern ebpf probe... logs don't help here, could you try to run in verbose mode?

sudo ./libscap/examples/01-open/scap-open --modern_bpf --verbose 7 
mdafsanhossain commented 5 months ago

Verbose log but no additional info on the prog loading.

verbose.log

Stringy commented 5 months ago

We've been investigating this unsupported error a fair bit the last couple of days, and we're not yet sure of the cause, whether that's the modern_bpf probe, clang (17), libbpf, or the RHEL kernel we're running on, but here's some additional context:

dmesg gives us an error indicating that an opcode is unsupported:

eBPF filter opcode 0039 (@2) unsupported

Which is BPF_LDX | BPF_DW | BPF_ABS, but based on docs this opcode seems to be for packet inspection and that doesn't fit with the programs that seem to be affected. (cc @erthalion as he's the one who identified this)

Our current suspicion is libbpf, and I'm currently getting it to dump the program instructions just before loading so we can inspect them further.

Andreagit97 commented 5 months ago

uhm wow, maybe we can use perf trace (https://github.com/iovisor/bcc/issues/3044#issuecomment-672573967) to dig into the root cause, we could also use retsnoop(https://github.com/anakryiko/retsnoop) but i don't think the build is supported on ppc64le :/

Stringy commented 5 months ago

@Andreagit97 @FedeDP should I open a new issue for tracking the unsupported opcode error?

Andreagit97 commented 5 months ago

yep it would be great thanks!

FedeDP commented 5 months ago

/milestone 7.1.0+driver