facebookincubator / below

A time traveling resource monitor for modern Linux systems
Apache License 2.0
1.01k stars 59 forks source link

Error in bpf_object__probe_loading():Operation not permitted(1) #8220

Closed daniejstriata closed 3 months ago

daniejstriata commented 4 months ago

How can I resolve the following error? Error in bpf_object__probe_loading():Operation not permitted(1)

Running the service:

Feb 28 13:57:22.357 ERRO Failed to load BPF program: System error, errno: 22 (EINVAL: Invalid argument)
Feb 28 13:57:27.358 WARN bpf error channel disconnected

I checked and is seems if my kernel does have CONFIG_BPF_SYSCALL set to y:

6.1.77-99.164.amzn2023.x86_64/config
116:CONFIG_BPF_SYSCALL=y
6.1.75-99.163.amzn2023.x86_64/config
116:CONFIG_BPF_SYSCALL=y
6.1.72-96.166.amzn2023.x86_64/config
116:CONFIG_BPF_SYSCALL=y

Debug Log:

libbpf: loading object 'exitstat_bpf' from buffer
libbpf: elf: section(2) .symtab, size 408, link 1, flags 0, type=2
libbpf: elf: section(3) .text, size 608, link 0, flags 6, type=1
libbpf: sec '.text': found program 'percpu_counter_read_positive' at insn offset 0 (0 bytes), code size 2 insns (16 bytes)
libbpf: sec '.text': found program 'task_cpu' at insn offset 2 (16 bytes), code size 19 insns (152 bytes)
libbpf: sec '.text': found program 'mm_active_rss_pages' at insn offset 21 (168 bytes), code size 55 insns (440 bytes)
libbpf: elf: section(4) tracepoint/sched/sched_process_exit, size 1488, link 0, flags 6, type=1
libbpf: sec 'tracepoint/sched/sched_process_exit': found program 'tracepoint__sched__sched_process_exit' at insn offset 0 (0 bytes), code size 186 insns (1488 bytes)
libbpf: elf: section(5) .maps, size 24, link 0, flags 3, type=1
libbpf: elf: section(6) license, size 4, link 0, flags 3, type=1
libbpf: license of exitstat_bpf is GPL
libbpf: elf: section(7) .reltracepoint/sched/sched_process_exit, size 32, link 2, flags 40, type=9
libbpf: elf: section(8) .BTF, size 69476, link 0, flags 0, type=1
libbpf: elf: section(9) .BTF.ext, size 2268, link 0, flags 0, type=1
libbpf: looking for externs among 17 symbols...
libbpf: collected 0 externs total
libbpf: map 'events': at sec_idx 5, offset 0.
libbpf: map 'events': found type = 4.
libbpf: map 'events': found key_size = 4.
libbpf: map 'events': found value_size = 4.
libbpf: sec '.reltracepoint/sched/sched_process_exit': collecting relocation for section(4) 'tracepoint/sched/sched_process_exit'
libbpf: sec '.reltracepoint/sched/sched_process_exit': relo #0: insn #173 against 'mm_active_rss_pages'
libbpf: sec '.reltracepoint/sched/sched_process_exit': relo #1: insn #178 against 'events'
libbpf: prog 'tracepoint__sched__sched_process_exit': found map 0 (events, sec 5, off 0) for insn #178
libbpf: Failed to bump RLIMIT_MEMLOCK (err = -1), you might need to do it explicitly!
libbpf: Error in bpf_object__probe_loading():Operation not permitted(1). Couldn't load trivial BPF program. Make sure your kernel supports BPF (CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough value.
libbpf: failed to load object 'exitstat_bpf'
libbpf: failed to load BPF skeleton 'exitstat_bpf': -1
dschatzberg commented 4 months ago

Hi @daniejstriata - I suspect the issue is related to RLIMIT_MEMLOCK which limits the amount of memory the below process is allowed to pin (necessary for this bpf program to work). Does below succeed if run with sudo privileges?

Also, is this a fatal error? I believe below will continue working just without the ability to monitor short-lived processes.

daniejstriata commented 4 months ago

@dschatzberg Monitoring did not appear to be affected. I'm running below as root. I compiled below with rust 1.76.0 and clang 15.

I changed the hard and soft memlock values to unlimited. After the increase I started the below service again and still get:

Feb 29 18:38:11 al2023 systemd[1]: Started below.service - below system monitor recording daemon.
Feb 29 18:38:11 al2023 below[2170]: Feb 29 18:38:11.460 DEBG Starting up!
Feb 29 18:38:16 al2023 below[2170]: Feb 29 18:38:16.461 ERRO Failed to load BPF program: System error, errno: 22 (EINVAL: Invalid argument)
Feb 29 18:38:21 al2023 below[2170]: Feb 29 18:38:21.463 WARN bpf error channel disconnected

Now there are more events in the debug log:

libbpf: loading object 'exitstat_bpf' from buffer
libbpf: elf: section(2) .symtab, size 408, link 1, flags 0, type=2
libbpf: elf: section(3) .text, size 608, link 0, flags 6, type=1
libbpf: sec '.text': found program 'percpu_counter_read_positive' at insn offset 0 (0 bytes), code size 2 insns (16 bytes)
libbpf: sec '.text': found program 'task_cpu' at insn offset 2 (16 bytes), code size 19 insns (152 bytes)
libbpf: sec '.text': found program 'mm_active_rss_pages' at insn offset 21 (168 bytes), code size 55 insns (440 bytes)
libbpf: elf: section(4) tracepoint/sched/sched_process_exit, size 1488, link 0, flags 6, type=1
libbpf: sec 'tracepoint/sched/sched_process_exit': found program 'tracepoint__sched__sched_process_exit' at insn offset 0 (0 bytes), code size 186 insns (1488 bytes)
libbpf: elf: section(5) .maps, size 24, link 0, flags 3, type=1
libbpf: elf: section(6) license, size 4, link 0, flags 3, type=1
libbpf: license of exitstat_bpf is GPL
libbpf: elf: section(7) .reltracepoint/sched/sched_process_exit, size 32, link 2, flags 40, type=9
libbpf: elf: section(8) .BTF, size 69461, link 0, flags 0, type=1
libbpf: elf: section(9) .BTF.ext, size 2268, link 0, flags 0, type=1
libbpf: looking for externs among 17 symbols...
libbpf: collected 0 externs total
libbpf: map 'events': at sec_idx 5, offset 0.
libbpf: map 'events': found type = 4.
libbpf: map 'events': found key_size = 4.
libbpf: map 'events': found value_size = 4.
libbpf: sec '.reltracepoint/sched/sched_process_exit': collecting relocation for section(4) 'tracepoint/sched/sched_process_exit'
libbpf: sec '.reltracepoint/sched/sched_process_exit': relo #0: insn #173 against 'mm_active_rss_pages'
libbpf: sec '.reltracepoint/sched/sched_process_exit': relo #1: insn #178 against 'events'
libbpf: prog 'tracepoint__sched__sched_process_exit': found map 0 (events, sec 5, off 0) for insn #178
libbpf: loading kernel BTF '/sys/kernel/btf/vmlinux': 0
libbpf: map 'events': setting size to 4
libbpf: map 'events': created successfully, fd=8
libbpf: sec '.text': found 12 CO-RE relocations
libbpf: CO-RE relocating [8] struct percpu_counter: found target candidate [349] struct percpu_counter in [vmlinux]
libbpf: prog 'percpu_counter_read_positive': relo #0: <byte_off> [8] struct percpu_counter.count (0:1 @ offset 64)
libbpf: prog 'percpu_counter_read_positive': relo #0: matching candidate #0 <byte_off> [349] struct percpu_counter.count (0:1 @ offset 8)
libbpf: prog 'percpu_counter_read_positive': relo #0: patched insn #0 (LDX/ST/STX) off 64 -> 8
libbpf: CO-RE relocating [59] struct task_struct___pre516: found target candidate [424] struct task_struct in [vmlinux]
libbpf: prog 'task_cpu': relo #1: <field_exists> [59] struct task_struct___pre516.cpu (0:0 @ offset 0)
libbpf: prog 'task_cpu': relo #1: non-matching candidate #0 <field_exists> [424] struct task_struct (0 @ offset 0)
libbpf: prog 'task_cpu': relo #1: no matching targets found
libbpf: prog 'task_cpu': relo #1: patched insn #1 (ALU/ALU64) imm 1 -> 0
libbpf: prog 'task_cpu': relo #2: <byte_off> [59] struct task_struct___pre516.cpu (0:0 @ offset 0)
libbpf: prog 'task_cpu': relo #2: non-matching candidate #0 <byte_off> [424] struct task_struct (0 @ offset 0)
libbpf: prog 'task_cpu': relo #2: no matching targets found
libbpf: prog 'task_cpu': relo #2: substituting insn #3 w/ invalid insn
libbpf: CO-RE relocating [60] struct task_struct___post516: found target candidate [424] struct task_struct in [vmlinux]
libbpf: prog 'task_cpu': relo #3: <byte_off> [60] struct task_struct___post516.thread_info.cpu (0:0:0 @ offset 0)
libbpf: prog 'task_cpu': relo #3: matching candidate #0 <byte_off> [424] struct task_struct.thread_info.cpu (0:0:3 @ offset 20)
libbpf: prog 'task_cpu': relo #3: patched insn #11 (ALU/ALU64) imm 0 -> 20
libbpf: CO-RE relocating [67] struct mm_struct___pre62: found target candidate [702] struct mm_struct in [vmlinux]
libbpf: prog 'mm_active_rss_pages': relo #4: <type_matches> [67] struct mm_struct___pre62
libbpf: prog 'mm_active_rss_pages': relo #4: non-matching candidate #0 <type_matches> [702] struct mm_struct
libbpf: prog 'mm_active_rss_pages': relo #4: no matching targets found
libbpf: prog 'mm_active_rss_pages': relo #4: patched insn #1 (ALU/ALU64) imm 1 -> 0
libbpf: prog 'mm_active_rss_pages': relo #5: <byte_off> [67] struct mm_struct___pre62.rss_stat.count[0].counter (0:0:0:0:0 @ offset 0)
libbpf: prog 'mm_active_rss_pages': relo #5: matching candidate #0 <byte_off> [702] struct mm_struct.rss_stat.count[0].counter (0:0:39:0:0:0 @ offset 720)
libbpf: prog 'mm_active_rss_pages': relo #5: patched insn #3 (ALU/ALU64) imm 0 -> 720
libbpf: prog 'mm_active_rss_pages': relo #6: <byte_off> [67] struct mm_struct___pre62.rss_stat.count[1].counter (0:0:0:1:0 @ offset 8)
libbpf: prog 'mm_active_rss_pages': relo #6: matching candidate #0 <byte_off> [702] struct mm_struct.rss_stat.count[1].counter (0:0:39:0:1:0 @ offset 728)
libbpf: prog 'mm_active_rss_pages': relo #6: patched insn #10 (ALU/ALU64) imm 8 -> 728
libbpf: prog 'mm_active_rss_pages': relo #7: <byte_off> [67] struct mm_struct___pre62.rss_stat.count[3].counter (0:0:0:3:0 @ offset 24)
libbpf: prog 'mm_active_rss_pages': relo #7: matching candidate #0 <byte_off> [702] struct mm_struct.rss_stat.count[3].counter (0:0:39:0:3:0 @ offset 744)
libbpf: prog 'mm_active_rss_pages': relo #7: patched insn #18 (ALU/ALU64) imm 24 -> 744
libbpf: CO-RE relocating [73] struct mm_struct___post62: found target candidate [702] struct mm_struct in [vmlinux]
libbpf: prog 'mm_active_rss_pages': relo #8: <byte_off> [73] struct mm_struct___post62.rss_stat[0] (0:0:0 @ offset 0)
libbpf: prog 'mm_active_rss_pages': relo #8: non-matching candidate #0 <byte_off> [702] struct mm_struct (0:0 @ offset 720)
libbpf: prog 'mm_active_rss_pages': relo #8: no matching targets found
libbpf: prog 'mm_active_rss_pages': relo #8: substituting insn #28 w/ invalid insn
libbpf: prog 'mm_active_rss_pages': relo #9: <byte_off> [8] struct percpu_counter.count (0:1 @ offset 64)
libbpf: prog 'mm_active_rss_pages': relo #9: matching candidate #0 <byte_off> [349] struct percpu_counter.count (0:1 @ offset 8)
libbpf: prog 'mm_active_rss_pages': relo #9: patched insn #48 (LDX/ST/STX) off 64 -> 8
libbpf: prog 'mm_active_rss_pages': relo #10: <byte_off> [8] struct percpu_counter.count (0:1 @ offset 64)
libbpf: prog 'mm_active_rss_pages': relo #10: matching candidate #0 <byte_off> [349] struct percpu_counter.count (0:1 @ offset 8)
libbpf: prog 'mm_active_rss_pages': relo #10: patched insn #49 (LDX/ST/STX) off 64 -> 8
libbpf: prog 'mm_active_rss_pages': relo #11: <byte_off> [8] struct percpu_counter.count (0:1 @ offset 64)
libbpf: prog 'mm_active_rss_pages': relo #11: matching candidate #0 <byte_off> [349] struct percpu_counter.count (0:1 @ offset 8)
libbpf: prog 'mm_active_rss_pages': relo #11: patched insn #50 (LDX/ST/STX) off 64 -> 8
libbpf: sec 'tracepoint/sched/sched_process_exit': found 17 CO-RE relocations
libbpf: CO-RE relocating [84] struct task_struct: found target candidate [424] struct task_struct in [vmlinux]
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #0: <byte_off> [84] struct task_struct.real_parent (0:81 @ offset 1416)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #0: matching candidate #0 <byte_off> [424] struct task_struct.real_parent (0:86 @ offset 2496)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #0: patched insn #23 (ALU/ALU64) imm 1416 -> 2496
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #1: <byte_off> [84] struct task_struct.tgid (0:79 @ offset 1404)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #1: matching candidate #0 <byte_off> [424] struct task_struct.tgid (0:84 @ offset 2484)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #1: patched insn #30 (ALU/ALU64) imm 1404 -> 2484
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #2: <byte_off> [84] struct task_struct.group_leader (0:85 @ offset 1464)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #2: matching candidate #0 <byte_off> [424] struct task_struct.group_leader (0:90 @ offset 2544)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #2: patched insn #37 (ALU/ALU64) imm 1464 -> 2544
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #3: <byte_off> [84] struct task_struct.sessionid (0:134 @ offset 2132)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #3: matching candidate #0 <byte_off> [424] struct task_struct.sessionid (0:139 @ offset 3188)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #3: patched insn #52 (ALU/ALU64) imm 2132 -> 3188
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #4: <field_exists> [59] struct task_struct___pre516.cpu (0:0 @ offset 0)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #4: non-matching candidate #0 <field_exists> [424] struct task_struct (0 @ offset 0)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #4: no matching targets found
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #4: patched insn #61 (ALU/ALU64) imm 1 -> 0
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #5: <byte_off> [59] struct task_struct___pre516.cpu (0:0 @ offset 0)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #5: non-matching candidate #0 <byte_off> [424] struct task_struct (0 @ offset 0)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #5: no matching targets found
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #5: substituting insn #63 w/ invalid insn
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #6: <byte_off> [60] struct task_struct___post516.thread_info.cpu (0:0:0 @ offset 0)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #6: matching candidate #0 <byte_off> [424] struct task_struct.thread_info.cpu (0:0:3 @ offset 20)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #6: patched insn #65 (ALU/ALU64) imm 0 -> 20
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #7: <byte_off> [84] struct task_struct.min_flt (0:104 @ offset 1776)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #7: matching candidate #0 <byte_off> [424] struct task_struct.min_flt (0:109 @ offset 2800)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #7: patched insn #78 (ALU/ALU64) imm 1776 -> 2800
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #8: <byte_off> [84] struct task_struct.maj_flt (0:105 @ offset 1784)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #8: matching candidate #0 <byte_off> [424] struct task_struct.maj_flt (0:110 @ offset 2808)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #8: patched insn #87 (ALU/ALU64) imm 1784 -> 2808
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #9: <byte_off> [84] struct task_struct.utime (0:96 @ offset 1640)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #9: matching candidate #0 <byte_off> [424] struct task_struct.utime (0:101 @ offset 2720)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #9: patched insn #96 (ALU/ALU64) imm 1640 -> 2720
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #10: <byte_off> [84] struct task_struct.stime (0:97 @ offset 1648)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #10: matching candidate #0 <byte_off> [424] struct task_struct.stime (0:102 @ offset 2728)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #10: patched insn #106 (ALU/ALU64) imm 1648 -> 2728
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #11: <byte_off> [84] struct task_struct.signal (0:122 @ offset 2024)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #11: matching candidate #0 <byte_off> [424] struct task_struct.signal (0:127 @ offset 3080)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #11: patched insn #116 (ALU/ALU64) imm 2024 -> 3080
libbpf: CO-RE relocating [940] struct signal_struct: found target candidate [931] struct signal_struct in [vmlinux]
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #12: <byte_off> [940] struct signal_struct.nr_threads (0:2 @ offset 8)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #12: matching candidate #0 <byte_off> [931] struct signal_struct.nr_threads (0:2 @ offset 8)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #12: patched insn #123 (ALU/ALU64) imm 8 -> 8
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #13: <byte_off> [84] struct task_struct.ioac.read_bytes (0:166:4 @ offset 4512)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #13: matching candidate #0 <byte_off> [424] struct task_struct.ioac.read_bytes (0:159:4 @ offset 3408)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #13: patched insn #134 (ALU/ALU64) imm 4512 -> 3408
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #14: <byte_off> [84] struct task_struct.ioac.write_bytes (0:166:5 @ offset 4520)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #14: matching candidate #0 <byte_off> [424] struct task_struct.ioac.write_bytes (0:159:5 @ offset 3416)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #14: patched insn #143 (ALU/ALU64) imm 4520 -> 3416
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #15: <byte_off> [84] struct task_struct.start_time (0:102 @ offset 1760)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #15: matching candidate #0 <byte_off> [424] struct task_struct.start_time (0:107 @ offset 2784)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #15: patched insn #152 (ALU/ALU64) imm 1760 -> 2784
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #16: <byte_off> [84] struct task_struct.mm (0:52 @ offset 1256)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #16: matching candidate #0 <byte_off> [424] struct task_struct.mm (0:56 @ offset 2336)
libbpf: prog 'tracepoint__sched__sched_process_exit': relo #16: patched insn #163 (ALU/ALU64) imm 1256 -> 2336
libbpf: prog 'tracepoint__sched__sched_process_exit': added 55 insns from sub-prog 'mm_active_rss_pages'
libbpf: prog 'tracepoint__sched__sched_process_exit': insn #173 relocated, imm 12 points to subprog 'mm_active_rss_pages' (now at 186 offset)
libbpf: prog 'tracepoint__sched__sched_process_exit': BPF program load failed: Invalid argument
libbpf: prog 'tracepoint__sched__sched_process_exit': -- BEGIN PROG LOAD LOG --
Validating mm_active_rss_pages() func#1...
arg#0 reference type('UNKNOWN ') size cannot be determined: -22
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
-- END PROG LOAD LOG --
libbpf: prog 'tracepoint__sched__sched_process_exit': failed to load: -22
libbpf: failed to load object 'exitstat_bpf'
libbpf: failed to load BPF skeleton 'exitstat_bpf': -22

If the error is not something to worry about we can close this off. I'd be happy to assist further if needed.

dschatzberg commented 4 months ago

What kernel version do you run on? My guess is the exitstat bpf program is just not compatible with your kernel version (BPF has no real compatibility guarantees) but it could be something in the build process.

My biggest concern would be if Below outright fatals on this error, which it sounds like it's not. We expect in some scenarios exitstat cannot work, just not clear to me if this is one of those cases or not.

daniejstriata commented 4 months ago

These are the kernels I have installed:

6.1.77-99.164.amzn2023.x86_64/config
116:CONFIG_BPF_SYSCALL=y
6.1.75-99.163.amzn2023.x86_64/config
116:CONFIG_BPF_SYSCALL=y
6.1.72-96.166.amzn2023.x86_64/config
116:CONFIG_BPF_SYSCALL=y

uname -r

6.1.77-99.164.amzn2023.x86_64
brianc118 commented 4 months ago

Hi @daniejstriata, thanks for reporting this. 8e3960582423c2cf162ad505dad1a99af5b749c1 has landed which should fix this. It looks like it was broken for a while since e96c4280b872b03fc5c159d2acf885023989af61.

I'm also working on a change internally to make sure we catch this by testing on VMs running various kernel versions.

daniejstriata commented 3 months ago

Can this change please be made as a new point release?

brianc118 commented 3 months ago

Yep, working on it

brianc118 commented 3 months ago

Hit some issues publishing 0.8.0 to crates.io. Seems like we pulled in non-released version of libbpf-rs crate in 0f677792c4da4ad4d11e0a0c5ecda7a19ad63e7a that is a breaking change, so we're blocked on that. Spoke to maintainer and there should be a new release at some point next week.

brianc118 commented 3 months ago

0.8.1 is released now on crates.io with fix. I'll close this issue now.