Open gpapakyriakopoulos opened 2 years ago
Sure would be nice if the kernel's debugfs kprobes just truncated strings instead of throwing warning messages and tainting the kernel, similar to how it handles memory faults. Ok, done complaining, besides there are probably kernel implementation trade offs I'm not aware of or don't understand.
We haven't repro'd this yet, but the example makes sense, so I'll proceed with some thoughts.
I think a similar fixed-copy-length approach could work here, assuming there isn't a horrible performance hit. Perhaps the volume of file events and typically short path names will lead to a lot of unnecessary bytes being copied. Maybe it won't matter.
Anyway, we are looking into a fix. Thanks for the bug report.
Thanks for the great info! I understand the tradeoffs you described, hopefully you can find a sweet spot between performance/usability and kernel/data safety. Feel free to ask for any additional info that we can provide if needed.
In case it helps, checking the warning in more detail, we saw that this time it is related to an openat()
call from kernel/trace/trace_event_perf.c:402 perf_trace_buf_alloc+0x86/0x90
. I assume this is a different sys call than the one from the previously reported issue, so I guess this is not a regression after all.
As mentioned in the title we observed a regression on one of our previously reported (and fixed) issues, namely #4 on elastic-agent 7.16.2. Errors like the following, along with the kernel taint flag being set to
512
, were observed when large enough event traces are generated :These issues are observed on
Linux 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64 GNU/Linux
as well asLinux debian 5.10.0-11-amd64 #1 SMP Debian 5.10.92-1 (2022-01-18) x86_64 GNU/Linux
host instances.A quick way to reproduce the issue on a host running elastic-agent
7.16.2
and ElasticEndpoint services is to create a large enough file in the filesystem, by running a command such as :touch /tmp/unix: A*2048 (or enough to exceed the perf buffer)