cilium / tetragon

eBPF-based Security Observability and Runtime Enforcement
https://tetragon.io
Apache License 2.0
3.62k stars 359 forks source link

tracingpolicy: pre-validate semantics first before loading sensor and its resources #831

Open tixxdz opened 1 year ago

tixxdz commented 1 year ago

There are cases where if we add a bogus TracingPolicy , the operation will continue, creating the generic kprobe resources where the main related hook will fail at BPF attach point, but other related resources will be created.

This will help to detect bogus TracingPolicy and have a consistent way to fail early and hard.

tixxdz commented 1 year ago

Logs trying to attach a kprobe to sys_fake syscall:

time="2023-03-22T12:48:47+01:00" level=debug msg="Received an AddTracingPolicy request" request="yaml:\"apiVersion: cilium.io/v1alpha1\\nkind: TracingPolicy\\nmetadata:\\n name: \\\"fake\\\"\\nspec:\\n kprobes:\\n - call: \\\"sys_fake\\\"\\n   syscall: true\\n\""
time="2023-03-22T12:48:47+01:00" level=warning msg="invalid or old kprobe spec: kprobe spec validation failed: call __x64_sys_fake not found"
time="2023-03-22T12:48:47+01:00" level=debug msg="Kprobe spec pre-validation succeeded" name=gkp-sensor-1
time="2023-03-22T12:48:47+01:00" level=info msg="Added generic kprobe sensor: bpf/objs/bpf_generic_kprobe_v53.o -> __x64_sys_fake"
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=info msg="BTF file: using metadata file" metadata=/sys/kernel/btf/vmlinux
time="2023-03-22T12:48:47+01:00" level=info msg="Loading sensor" name=gkp-sensor-1
time="2023-03-22T12:48:47+01:00" level=info msg="Loading kernel version 5.10.0"
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Checking for bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=debug msg="Found bpf file" file=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=info msg="tetragon, map loaded." map=fdinstall_map path=/sys/fs/bpf/tetragon/gkp-sensor-1-fdinstall_map sensor=gkp-sensor-1
time="2023-03-22T12:48:47+01:00" level=info msg="tetragon, map loaded." map=config_map path=/sys/fs/bpf/tetragon/gkp-sensor-1-gkp-0-config_map sensor=gkp-sensor-1
time="2023-03-22T12:48:47+01:00" level=info msg="tetragon, map loaded." map=kprobe_calls path=/sys/fs/bpf/tetragon/gkp-sensor-1-gkp-0-kp_calls sensor=gkp-sensor-1
time="2023-03-22T12:48:47+01:00" level=info msg="tetragon, map loaded." map=filter_map path=/sys/fs/bpf/tetragon/gkp-sensor-1-gkp-0-filter_map sensor=gkp-sensor-1
time="2023-03-22T12:48:47+01:00" level=info msg="tetragon, map loaded." map=argfilter_maps path=/sys/fs/bpf/tetragon/gkp-sensor-1-gkp-0-argfilter_maps sensor=gkp-sensor-1
time="2023-03-22T12:48:47+01:00" level=info msg="tetragon, map loaded." map=retprobe_map path=/sys/fs/bpf/tetragon/gkp-sensor-1-gkp-0-retprobe_map sensor=gkp-sensor-1
time="2023-03-22T12:48:47+01:00" level=info msg="tetragon, map loaded." map=process_call_heap path=/sys/fs/bpf/tetragon/gkp-sensor-1-gkp-0-process_call_heap sensor=gkp-sensor-1
time="2023-03-22T12:48:47+01:00" level=info msg="tetragon, map loaded." map=sel_names_map path=/sys/fs/bpf/tetragon/gkp-sensor-1-gkp-0-sel_names_map sensor=gkp-sensor-1
time="2023-03-22T12:48:47+01:00" level=debug msg=observerLoadInstancebpf/objs/bpf_generic_kprobe_v53.o330240 kern_version=330240 prog=bpf/objs/bpf_generic_kprobe_v53.o
time="2023-03-22T12:48:47+01:00" level=info msg="Loading registered BPF probe" Program=bpf/objs/bpf_generic_kprobe_v53.o Type=generic_kprobe
time="2023-03-22T12:48:47+01:00" level=debug msg="pin file for map 'policy_filter_maps' not found, map is not shared!\n" prog=kprobe/generic_kprobe
time="2023-03-22T12:48:47+01:00" level=debug msg="pin file for map 'override_tasks' not found, map is not shared!\n" prog=kprobe/generic_kprobe
time="2023-03-22T12:48:47+01:00" level=debug msg="pin file for map 'buffer_heap_map' not found, map is not shared!\n" prog=kprobe/generic_kprobe
^Ctime="2023-03-22T12:48:54+01:00" level=info msg="Received signal interrupt, shutting down..."
time="2023-03-22T12:48:54+01:00" level=info msg="Listening for events completed." error="context canceled"
time="2023-03-22T12:48:54+01:00" level=info msg="BPF events statistics: 7 received, 0% events loss"
time="2023-03-22T12:48:54+01:00" level=info msg="Observer events statistics" errors=0 filterDrop=0 filterPass=0 lost=0 received=7
time="2023-03-22T12:48:54+01:00" level=info msg="BPF prog was unloaded" label=kprobe/__put_task_struct pin=event_exit
time="2023-03-22T12:48:54+01:00" level=info msg="BPF prog was unloaded" label=kprobe/wake_up_new_task pin=kprobe_pid_clear
time="2023-03-22T12:48:54+01:00" level=info msg="BPF prog was unloaded" label=tracepoint/sys_execve pin=event_execve
time="2023-03-22T12:48:54+01:00" level=debug msg="Refusing to remove kprobe/generic_kprobe, program not loaded" label=kprobe/generic_kprobe pin=gkp-sensor-1-gkp-0-__x64_sys_fake_prog
time="2023-03-22T12:48:54+01:00" level=debug msg="Refusing to remove kprobe/generic_kprobe, program not loaded" label=kprobe/generic_kprobe pin=gkp-sensor-1-gkp-0-__x64_sys_fake_prog
time="2023-03-22T12:48:54+01:00" level=info msg="map was unloaded" map=execve_map pin=execve_map
time="2023-03-22T12:48:54+01:00" level=info msg="map was unloaded" map=execve_map_stats pin=execve_map_stats

The cleaning code thinks that the attach succeeded, and complains when trying to unload.

tixxdz commented 1 year ago

tetra command:

$ ./tetra tracingpolicy add examples/tracingpolicy/fake.yaml 
failed to add tracing policy: rpc error: code = Unknown desc = sensor gkp-sensor-1 from collection fake failed to load: failed prog bpf/objs/bpf_generic_kprobe_v53.o kern_version 330240 LoadKprobeProgram: attaching 'generic_kprobe_event' failed: creating perf_kprobe PMU: token __x64___x64_sys_fake: not found: no such file or directory

Should be a better message here, error out before trying to attach BPF program, also seems we have an extra __x64 arch prefix?

mtardy commented 1 year ago

Really interested in that one!

The extra prefix is because of cilium/ebpf. It does this automatically when the hook is not found. It was patched recently (not in the best way, but in a way that makes the error message more understandable), see https://github.com/cilium/ebpf/issues/956 and https://github.com/cilium/ebpf/pull/959.

Now on the Tetragon side, we try to manage the prefix before cilium/ebpf in order for the user to not think that two symbols exist (the one with and the one without the prefix), see this https://github.com/cilium/tetragon/pull/752.

mtardy commented 1 year ago

We can definitely improve a lot on the agent userspace validation to provide sensible error messages.

tixxdz commented 1 year ago

@mtardy alright, thanks for the links will check them. BTW I'm doing the kprobe one for now, so to reduce tracing policies errors, you can take the tracepoint one later if you want.