Open abhinav92003 opened 9 months ago
I have successfully tested the main branch on my local system and can confirm that it is functioning as expected.
I hit this missing address error in every run in these 3 tests on my machine:
408 - code_api|tool.drcacheoff.kernel.simple_SUDO (Failed)
409 - code_api|tool.drcacheoff.kernel.opcode-mix_SUDO (Failed)
410 - code_api|tool.drcacheoff.kernel.syscall-mix_SUDO (Failed)
It is a Debian-ish 6.5.6 kernel. Should be same as @abhinav92003 's machine.
FTR the following is the local workaround I'm using. It adds two addresses that showed up in error messages. They may be different on different machines of course.
$ git diff
diff --git a/clients/drcachesim/tracer/kcore_copy.cpp b/clients/drcachesim/tracer/kcore_copy.cpp
index 962ab73f7..1e2424636 100644
--- a/clients/drcachesim/tracer/kcore_copy.cpp
+++ b/clients/drcachesim/tracer/kcore_copy.cpp
@@ -370,6 +370,21 @@ kcore_copy_t::read_modules()
last_module = module;
}
}
+ proc_module_t *module = (proc_module_t *)dr_global_alloc(sizeof(proc_module_t));
+ module->start = 0xffffffffc039b000;
+ module->end = 0xffffffffc039b000 + 0x10000;
+ module->next = nullptr;
+ kcore_code_segments_num_++;
+ last_module->next = module;
+ last_module = module;
+
+ module = (proc_module_t *)dr_global_alloc(sizeof(proc_module_t));
+ module->start = 0xffffffffc03777c8;
+ module->end = 0xffffffffc03777c8 + 0x10000;
+ module->next = nullptr;
+ kcore_code_segments_num_++;
+ last_module->next = module;
+ last_module = module;
f.close();
return true;
}
There are now 4 tests and all fail every time on my machine:
The following tests FAILED:
413 - code_api|tool.drcacheoff.kernel.simple_SUDO (Failed)
414 - code_api|tool.drcacheoff.kernel.opcode-mix_SUDO (Failed)
415 - code_api|tool.drcacheoff.kernel.syscall-mix_SUDO (Failed)
416 - code_api|tool.drcacheoff.kernel.invariant-checker_SUDO (Failed)
They're all because of the same underlying reason: PT raw traces containing some instr not in the range of any /proc/modules.
On the same system where the test failures reproduce, I tried using perf to trace the same test app (suite/tests/bin/simple_app), and it didn't fail (see https://perf.wiki.kernel.org/index.php/Perf_tools_support_for_Intel%C2%AE_Processor_Trace#Kernel-only_tracing for the detailed steps).
$ sudo perf record -a --kcore -e intel_pt/cyc,noretcomp/k -- dynamorio/suite/tests/bin/simple_app
$ sudo chown -R <user> perf.data
$ perf script --insn-trace --xed &> perf_op.txt
I was able to find the address that failed in our tests in the perf output:
403: *** postcmd failed (1): drpt2ir: [cd8, IP:ffffffffc0063064] get next
403: instruction error: no memory mapped at this address
$ grep "ffffffffc0063064" perf_op.txt
perf 229533 [005] 104007.009349630: ffffffffc4fe1057 nft_chain_nat_init+0x47 ([kernel.kallsyms]) callq 0xffffffffc0063064
perf 229533 [005] 104007.009349630: ffffffffc0063064 bpf_prog_f7765d2581983488_file_monitoring+0x0 (bpf_prog_f7765d2581983488_file_monitoring) nop %edi, %edx
perf-exec 229534 [001] 104007.009526523: ffffffffc4fe1057 nft_chain_nat_init+0x47 ([kernel.kallsyms]) callq 0xffffffffc0063064
perf-exec 229534 [001] 104007.009526523: ffffffffc0063064 bpf_prog_f7765d2581983488_file_monitoring+0x0 (bpf_prog_f7765d2581983488_file_monitoring) nop %edi, %edx
perf-exec 229534 [001] 104007.009559435: ffffffffc4fe1057 nft_chain_nat_init+0x47 ([kernel.kallsyms]) callq 0xffffffffc0063064
perf-exec 229534 [001] 104007.009559435: ffffffffc0063064 bpf_prog_f7765d2581983488_file_monitoring+0x0 (bpf_prog_f7765d2581983488_file_monitoring) nop %edi, %edx
simple_app 229534 [001] 104007.010255712: ffffffffc4fe1057 nft_chain_nat_init+0x47 ([kernel.kallsyms]) callq 0xffffffffc0063064
simple_app 229534 [001] 104007.010255712: ffffffffc0063064 bpf_prog_f7765d2581983488_file_monitoring+0x0 (bpf_prog_f7765d2581983488_file_monitoring) nop %edi, %edx
simple_app 229534 [001] 104007.010305774: ffffffffc4fe1057 nft_chain_nat_init+0x47 ([kernel.kallsyms]) callq 0xffffffffc0063064
simple_app 229534 [001] 104007.010305774: ffffffffc0063064 bpf_prog_f7765d2581983488_file_monitoring+0x0 (bpf_prog_f7765d2581983488_file_monitoring) nop %edi, %edx
So perf is doing it right. We need to see what we're missing.
Here's the documented logic used by perf for copying kcore: https://github.com/torvalds/linux/blob/6d0dc8559c847e2dcd66c5dd93dbab3d3d887ff5/tools/perf/util/symbol-elf.c#L2473.
Particularly,
The kernel map starts at _stext or the lowest function symbol, and ends at _etext or the highest function symbol.
Our kcore_copy looks only at _stext
and _etext
though: https://github.com/DynamoRIO/dynamorio/blob/76bfa29a369c0f82c4888982270ba7c7445f4838/clients/drcachesim/tracer/kcore_copy.cpp#L393
Based on the contents of my /proc/kallsyms, I see that if we also considered the address of the lowest and highest t,w,T,W symbols, the unmapped address reported by libipt would be covered.
Following up on our offline discussion: @dolanzhao Can you provide details of the kernel version you were able to reproduce this issue with? Do you have a fix ready that we can review and commit?
Can you provide details of the kernel version you were able to reproduce this issue with?
Yes. The kernel version is 6.2.0-39.
# uname -a
Linux dolan-ubuntu 6.2.0-39-generic #40~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 16 10:53:04 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Do you have a fix ready that we can review and commit?
I have a draft patch, but it's not yet complete. I attempted to test this solution on Linux kernel version 6.5. However, I encountered an issue where the std::ifstream (glibc 2.35) causes a crash, same as https://github.com/DynamoRIO/dynamorio/issues/5763. Due to this problem, I take some time to test the solution with a different version of the glibc(glibc 2.33 and glibc 2.36).
Based on the contents of my /proc/kallsyms, I see that if we also considered the address of the lowest and highest t,w,T,W symbols, the unmapped address reported by libipt would be covered.
~Looking at the perf logic more closely, perf's implementation for copying the kcore includes the missing addr because: when determining the start addr of the module map, it considers not only the addresses in /proc/modules
but also the lowest module-related symbol in /proc/kallsyms
1.~
E.g., in this case, the missing IP is 0xffffffffc002bb08.
The kernel map _stext, _etext is 0xffffffffaf800000, 0xffffffffb0600000, which does not cover the addr.
lowest_module_addr, highest_module_addr from /proc/modules
is 0xffffffffc0225000, 0xffffffffc5242000 which also doesn't cover the addr.
~However,~ the lowest and highest module-related symbol in /proc/kallsyms
(symbols that have the module name in brackets in their /proc/kallsyms
entry) is ~0xffffffffaf800000~ 0xffffffffc0225000, 0xffffffffc522b338; ~this range indeed covers the missing addr.~
Our kcore_copy differs from perf also in that: we copy the specific ranges for each module4, instead of everything between the lowest and highest module symbols range3. IIRC this was intended as an optimization to reduce the size of the dumped kcore. ~Would probably need to fix this.~
@dolanzhao Let me know if you have any comments on this. It may be easier for me to patch this issue because I can actually reproduce this error on my workstation. I gather from our last discussion that your previous comment actually refers to a different issue that affects newer hardware (fixed by #6552 by updating the libipt version we use), and that your system does not reproduce this particular issue.
Edit: striked out some incorrect observations. The missing symbol is actually not covered between the lowest and highest module-related function symbols from /proc/kallsyms
. That was an incorrect observation on my part. The missing symbol is covered between the lowest and highest function symbols (as also noted in a comment above). But there seem to be more complexities as even perf does not copy this range.
As a side note: perf also has this logic to copy the "entry trampolines" that we don't do4. I didn't observe any failures in our tracing because of missing this, so will just note it in a code comment for now.
I tried using DR's dumped kcore with perf script
by copying it to the perf.data dir of a perf-collected trace, and perf was still able to decode its trace. I also tried using perf's dumped kcore with our libipt decode on a DR trace (by copying perf's kcore to kernel.raw in the DR trace dir), and DR's libipt decode still failed at the same missing IP.
Also, just looking at perf's kcore copy logic, I couldn't find how it dumps the missing IP additionally. As noted in the comment above, the missing IP does lie between the lowest and highest function symbols in /proc/kallsyms; but perf does not dump that region if it finds stext and etext (which are indeed present in my /proc/kallsyms).
Also, perf's kcore does not show the missing IP (0xffffffffc01c5754) in it when I use readelf:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000001000 0xffffffffb4a00000 0x0000000000000000
0x0000000000e00000 0x0000000000e00000 RWE 0x1000
LOAD 0x0000000000e01000 0xffffffffc0386000 0x0000000000000000
0x0000000004e8a000 0x0000000004e8a000 RWE 0x1000
Now I'm thinking: even though dumping that missing IP additionally in our kcore_copy helps workaround this issue, the real difference between perf and us may not be in the kcore copy logic.
When I grep for "bpf_prog_fb33d7816e42d685_file_monitoring_open" (the perf script output says this is the symbol at the missing IP), I could find it in the perf.data/data binary, but not in perf.data/kcore_dir. Maybe perf dumps more information in the trace, which helps it decode later.
Speculation: I'm reading that eBPF code may be JIT compiled (also the symbol above has that hex number in it which seems weird), and JIT code probably doesn't have symbols in /proc/kallsyms1.
I don't know how we can reliably identify and dump such JIT regions though.
I was able to find the symbol for the missing IP in /proc/kallsyms after relaxing /proc/sys/net/core/bpf_jit_harden. One possible workaround is that we special-case bpf, and copy the /proc/kcore regions that correspond to all "[bpf]"-related symbols in /proc/kallsyms. Still curious what perf does that allows it to get the BPF JIT code symbols/encodings in its trace.
I verified that /proc/modules hasn't changed even after relaxing bpf_jit_harden.
I'm seeing errors like the following happen sporadically during decoding of Intel-PT kernel traces:
When it happens, it's always the same address that's unmapped. When I hard-coded that address (and another that was revealed when this error was fixed) to be copied in kcore_copy, the error went away. I couldn't find details of the address in /proc/modules or /proc/kallsyms but it is a part of the kcore code section (we copy only the memory that corresponds to the live kernel modules from /proc/modules). It's presumably some kernel code that's executing during system calls.
The errors don't happen every time but they seem to have started happening frequently enough. Not sure if it was some change in my machine's kernel that caused the unmapped instrs to be executed.