Closed fearful-symmetry closed 2 months ago
Not sure I grok this but, won't this break for multiple cpus? Two cpus entering the same function would basically store the return value in the same storage space.
I guess the magic is in the ctx
arithmetic, where does that come from?
@haesbaert hmm, good point, didn't think of that. @stanek-michal do you know how this will work across CPUs?
Hmm. Tested with bpftrace via kretfunc
, the addresses show up fine:
Tracing TCP accepts. Hit Ctrl-C to end.
TIME PID COMM RADDR RPORT LADDR LPORT BL
11:33:39 4184470 python3 192.168.1.124 54570 192.168.1.37 8000 0/5
Using the branch where I rewrote the probes to use an updated vmlinux, it works fine:
sudo ./EventsTrace --net-conn-accept -i
{"probes_initialized": true, "features": {"bpf_tramp": true}}
{"event_type":"NETWORK_CONNECTION_ACCEPTED","pids":{"tid":4184470,"tgid":4184470,"ppid":4145903,"pgid":4184470,"sid":4145903,"start_time_ns":1468277566951535},"net":{"transport":"TCP","family":"AF_INET","source_address":"192.168.1.37","source_port":8000,"destination_address":"192.168.1.124","destination_port":54728,"network_namespace":4026531840},"comm":"python3"}
I guess FUNC_RET_READ
is behaving in some unexpected way here?
So I was wrong, ctx + 0 is supposed to be where the return address is, but I can't seem to find the layout of ctx.
with the rellocation addition it's correct I believe.
So I was wrong, ctx + 0 is supposed to be where the return address is, but I can't seem to find the layout of ctx.
Yeah, its eBPF's R0 which is the return value, the layout I guess is just the eBPF registers (like pt_regs but for eBPF arch)
So I was wrong, ctx + 0 is supposed to be where the return address is, but I can't seem to find the layout of ctx.
Yeah, its eBPF's R0 which is the return value, the layout I guess is just the eBPF registers (like pt_regs but for eBPF arch)
Not sure that's quite right, if I understand this now, it does ctx + patched_return_index
, which is the last register in the return function:
int BPF_PROG(
fexit__inet_csk_accept, struct sock *sk, int flags, int *err, bool kern, struct sock *ret)
So the patched_return_index here would be 4, meaning use R5 (R1=sk; R2=flags; R3=err; R4=kern; R5=ret). Problem before is we were doing ctx + 0 which is R1.
So I was wrong, ctx + 0 is supposed to be where the return address is, but I can't seem to find the layout of ctx.
Yeah, its eBPF's R0 which is the return value, the layout I guess is just the eBPF registers (like pt_regs but for eBPF arch)
Not sure that's quite right, if I understand this now, it does
ctx + patched_return_index
, which is the last register in the return function:int BPF_PROG( fexit__inet_csk_accept, struct sock *sk, int flags, int *err, bool kern, struct sock *ret)
So the patched_return_index here would be 4, meaning use R5 (R1=sk; R2=flags; R3=err; R4=kern; R5=ret). Problem before is we were doing ctx + 0 which is R1.
Oh interesting. I guess the ctx (which seems to be struct 'bpf_context') can be different things in different cases, like for network stuff its the packet data etc.
Not really sure what's going on with the CI failures, can't see anything other than this:
make: *** [Makefile:171: run-multikernel-test] Error 1
So I was wrong, ctx + 0 is supposed to be where the return address is, but I can't seem to find the layout of ctx.
Yeah, its eBPF's R0 which is the return value, the layout I guess is just the eBPF registers (like pt_regs but for eBPF arch)
Not sure that's quite right, if I understand this now, it does
ctx + patched_return_index
, which is the last register in the return function:int BPF_PROG( fexit__inet_csk_accept, struct sock *sk, int flags, int *err, bool kern, struct sock *ret)
So the patched_return_index here would be 4, meaning use R5 (R1=sk; R2=flags; R3=err; R4=kern; R5=ret). Problem before is we were doing ctx + 0 which is R1.
Oh interesting. I guess the ctx (which seems to be struct 'bpf_context') can be different things in different cases, like for network stuff its the packet data etc.
I guess so as well, I'll try to read the bpf code later and see if I can confirm.
are the tests just....timing out?
2024-08-27T17:42:18.7669228Z [ ] stdout: timed out waiting for EventsTrace to get ready, dumped stderr above
2024-08-27T17:42:18.7669408Z [ ] stdout: ===== STACKTRACE FOR FAILED TEST =====
2024-08-27T17:42:18.7669543Z [ ] stdout: goroutine 1 [running]:
2024-08-27T17:42:18.7669709Z [ ] stdout: main.TestFail({0x400088fd98, 0x1, 0x1})
2024-08-27T17:42:18.7670123Z [ ] stdout: /home/runner/work/ebpf/ebpf/testing/testrunner/utils.go:292 +0xe8
2024-08-27T17:42:18.7670417Z [ ] stdout: main.(*EventsTraceInstance).Start(0x4000012080, {0x13d168, 0x400008c240})
2024-08-27T17:42:18.7670761Z [ ] stdout: /home/runner/work/ebpf/ebpf/testing/testrunner/eventstrace.go:87 +0x360
2024-08-27T17:42:18.7670983Z [ ] stdout: main.RunEventsTest(0x11dd40, {0x400009df38, 0x1, 0x1})
2024-08-27T17:42:18.7671283Z [ ] stdout: /home/runner/work/ebpf/ebpf/testing/testrunner/utils.go:350 +0x190
2024-08-27T17:42:18.7671395Z [ ] stdout: main.main()
2024-08-27T17:42:18.7671684Z [ ] stdout: /home/runner/work/ebpf/ebpf/testing/testrunner/main.go:30 +0x2dc
closes https://github.com/elastic/ebpf/issues/200
This changes the
inet_csk_accept
fexit probe to useFUNC_RET_READ
instead of passing args to the method; this means we don't have to deal with the function arguments changing, like with the 6.10 kernel.Tested on Fedora with 6.10.3, but we seem to be missing the addr fields. Not sure if this is a quirk of using the fexit probe, or if there's a bug here and the
inet_csk_accept
method no longer works the way we assume.