DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.61k stars 554 forks source link

i#6486 kernel tracing: Include BPF JIT code in kcore dump #6619

Closed abhinav92003 closed 7 months ago

abhinav92003 commented 7 months ago

Fixes drmemtrace kernel trace libipt post-processing failures caused by missing instruction encodings for some kernel code execution captured using Intel-PT.

The root-cause seems to be that JIT code executed by the kernel, BPF code in this case, does not have entries in /proc/modules. So, our kcore dump logic did not include them. This fix looks for BPF related symbols in /proc/kallsyms and includes them in the copied regions from /proc/kcore.

Note that BPF JIT symbols are not included in /proc/kallsyms by default. One needs to set /proc/sys/net/core/bpf_jit_harden and /proc/sys/net/core/bpf_jit_kallsyms appropriately (see https://docs.kernel.org/admin-guide/sysctl/net.html#proc-sys-net-core-network-core-options for more details). Added this suggestion to documentation. It may be better to not automatically make this possibly-too-intrusive change to the user's machine automatically in cmake. This is probably fine because the issue is not widespread (not reproduced on public Linux distributions).

Tested PT tracing related tests locally on a machine that supports Intel-PT:

$ ctest -R 'drpttracer|drcacheoff.kernel'
...
    Start 213: code_api|client.drpttracer_SUDO-test
[sudo] password for sharmaabhinav: 
1/5 Test #213: code_api|client.drpttracer_SUDO-test .....................   Passed    4.29 sec
    Start 412: code_api|tool.drcacheoff.kernel.simple_SUDO
2/5 Test #412: code_api|tool.drcacheoff.kernel.simple_SUDO ..............   Passed    4.66 sec
    Start 413: code_api|tool.drcacheoff.kernel.opcode-mix_SUDO
3/5 Test #413: code_api|tool.drcacheoff.kernel.opcode-mix_SUDO ..........   Passed    4.71 sec
    Start 414: code_api|tool.drcacheoff.kernel.syscall-mix_SUDO
4/5 Test #414: code_api|tool.drcacheoff.kernel.syscall-mix_SUDO .........   Passed    4.59 sec
    Start 415: code_api|tool.drcacheoff.kernel.invariant-checker_SUDO
5/5 Test #415: code_api|tool.drcacheoff.kernel.invariant-checker_SUDO ...   Passed    5.75 sec

100% tests passed, 0 tests failed out of 5

Unfortunately the decode errors do not go away completely even after this fix, but they have become very less frequent now (tool.kernel.simple in release build failed after 40 successful runs after this fix, which failed every run before).

Issue: #6486

abhinav92003 commented 7 months ago

On doing more stress testing: unfortunately the decode errors do not go away completely even after this fix, but they have become very less frequent now (tool.kernel.simple in release build failed after 40 successful runs). The address indicated in the error message, I couldn't find it in /proc/kallsyms like before.