libipt decode fails due to missing mapped addresses

abhinav92003 commented 9 months ago

I'm seeing errors like the following happen sporadically during decoding of Intel-PT kernel traces:

408:   *** postcmd failed (1): drpt2ir: [28430, IP:ffffffffc11dd000] get next
408:   instruction error: no memory mapped at this address

When it happens, it's always the same address that's unmapped. When I hard-coded that address (and another that was revealed when this error was fixed) to be copied in kcore_copy, the error went away. I couldn't find details of the address in /proc/modules or /proc/kallsyms but it is a part of the kcore code section (we copy only the memory that corresponds to the live kernel modules from /proc/modules). It's presumably some kernel code that's executing during system calls.

The errors don't happen every time but they seem to have started happening frequently enough. Not sure if it was some change in my machine's kernel that caused the unmapped instrs to be executed.

dolanzhao commented 9 months ago

I have successfully tested the main branch on my local system and can confirm that it is functioning as expected.

derekbruening commented 9 months ago

I hit this missing address error in every run in these 3 tests on my machine:

        408 - code_api|tool.drcacheoff.kernel.simple_SUDO (Failed)
        409 - code_api|tool.drcacheoff.kernel.opcode-mix_SUDO (Failed)
        410 - code_api|tool.drcacheoff.kernel.syscall-mix_SUDO (Failed)

It is a Debian-ish 6.5.6 kernel. Should be same as @abhinav92003 's machine.

abhinav92003 commented 9 months ago

FTR the following is the local workaround I'm using. It adds two addresses that showed up in error messages. They may be different on different machines of course.

$ git diff
diff --git a/clients/drcachesim/tracer/kcore_copy.cpp b/clients/drcachesim/tracer/kcore_copy.cpp
index 962ab73f7..1e2424636 100644
--- a/clients/drcachesim/tracer/kcore_copy.cpp
+++ b/clients/drcachesim/tracer/kcore_copy.cpp
@@ -370,6 +370,21 @@ kcore_copy_t::read_modules()
             last_module = module;
         }
     }
+    proc_module_t *module = (proc_module_t *)dr_global_alloc(sizeof(proc_module_t));
+    module->start = 0xffffffffc039b000;
+    module->end = 0xffffffffc039b000 + 0x10000;
+    module->next = nullptr;
+    kcore_code_segments_num_++;
+    last_module->next = module;
+    last_module = module;
+
+    module = (proc_module_t *)dr_global_alloc(sizeof(proc_module_t));
+    module->start = 0xffffffffc03777c8;
+    module->end = 0xffffffffc03777c8 + 0x10000;
+    module->next = nullptr;
+    kcore_code_segments_num_++;
+    last_module->next = module;
+    last_module = module;
     f.close();
     return true;
 }

derekbruening commented 8 months ago

There are now 4 tests and all fail every time on my machine:

The following tests FAILED:
        413 - code_api|tool.drcacheoff.kernel.simple_SUDO (Failed)
        414 - code_api|tool.drcacheoff.kernel.opcode-mix_SUDO (Failed)
        415 - code_api|tool.drcacheoff.kernel.syscall-mix_SUDO (Failed)
        416 - code_api|tool.drcacheoff.kernel.invariant-checker_SUDO (Failed)

abhinav92003 commented 8 months ago

They're all because of the same underlying reason: PT raw traces containing some instr not in the range of any /proc/modules.

abhinav92003 commented 8 months ago

On the same system where the test failures reproduce, I tried using perf to trace the same test app (suite/tests/bin/simple_app), and it didn't fail (see https://perf.wiki.kernel.org/index.php/Perf_tools_support_for_Intel%C2%AE_Processor_Trace#Kernel-only_tracing for the detailed steps).

$ sudo perf record -a --kcore -e intel_pt/cyc,noretcomp/k  -- dynamorio/suite/tests/bin/simple_app
$ sudo chown -R <user> perf.data
$ perf script --insn-trace --xed &> perf_op.txt

I was able to find the address that failed in our tests in the perf output:

403:   *** postcmd failed (1): drpt2ir: [cd8, IP:ffffffffc0063064] get next
403:   instruction error: no memory mapped at this address

$ grep "ffffffffc0063064" perf_op.txt 
            perf  229533 [005] 104007.009349630:  ffffffffc4fe1057 nft_chain_nat_init+0x47 ([kernel.kallsyms])      callq  0xffffffffc0063064
            perf  229533 [005] 104007.009349630:  ffffffffc0063064 bpf_prog_f7765d2581983488_file_monitoring+0x0 (bpf_prog_f7765d2581983488_file_monitoring)        nop %edi, %edx
       perf-exec  229534 [001] 104007.009526523:  ffffffffc4fe1057 nft_chain_nat_init+0x47 ([kernel.kallsyms])      callq  0xffffffffc0063064
       perf-exec  229534 [001] 104007.009526523:  ffffffffc0063064 bpf_prog_f7765d2581983488_file_monitoring+0x0 (bpf_prog_f7765d2581983488_file_monitoring)        nop %edi, %edx
       perf-exec  229534 [001] 104007.009559435:  ffffffffc4fe1057 nft_chain_nat_init+0x47 ([kernel.kallsyms])      callq  0xffffffffc0063064
       perf-exec  229534 [001] 104007.009559435:  ffffffffc0063064 bpf_prog_f7765d2581983488_file_monitoring+0x0 (bpf_prog_f7765d2581983488_file_monitoring)        nop %edi, %edx
      simple_app  229534 [001] 104007.010255712:  ffffffffc4fe1057 nft_chain_nat_init+0x47 ([kernel.kallsyms])      callq  0xffffffffc0063064
      simple_app  229534 [001] 104007.010255712:  ffffffffc0063064 bpf_prog_f7765d2581983488_file_monitoring+0x0 (bpf_prog_f7765d2581983488_file_monitoring)        nop %edi, %edx
      simple_app  229534 [001] 104007.010305774:  ffffffffc4fe1057 nft_chain_nat_init+0x47 ([kernel.kallsyms])      callq  0xffffffffc0063064
      simple_app  229534 [001] 104007.010305774:  ffffffffc0063064 bpf_prog_f7765d2581983488_file_monitoring+0x0 (bpf_prog_f7765d2581983488_file_monitoring)        nop %edi, %edx

So perf is doing it right. We need to see what we're missing.

abhinav92003 commented 8 months ago

Here's the documented logic used by perf for copying kcore: https://github.com/torvalds/linux/blob/6d0dc8559c847e2dcd66c5dd93dbab3d3d887ff5/tools/perf/util/symbol-elf.c#L2473.

Particularly,

 The kernel map starts at _stext or the lowest function symbol, and ends at _etext or the highest function symbol.

Our kcore_copy looks only at _stext and _etext though: https://github.com/DynamoRIO/dynamorio/blob/76bfa29a369c0f82c4888982270ba7c7445f4838/clients/drcachesim/tracer/kcore_copy.cpp#L393

Based on the contents of my /proc/kallsyms, I see that if we also considered the address of the lowest and highest t,w,T,W symbols, the unmapped address reported by libipt would be covered.

abhinav92003 commented 8 months ago

Following up on our offline discussion: @dolanzhao Can you provide details of the kernel version you were able to reproduce this issue with? Do you have a fix ready that we can review and commit?

dolanzhao commented 8 months ago

Can you provide details of the kernel version you were able to reproduce this issue with?

Yes. The kernel version is 6.2.0-39.

# uname -a
Linux dolan-ubuntu 6.2.0-39-generic #40~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 16 10:53:04 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Do you have a fix ready that we can review and commit?

I have a draft patch, but it's not yet complete. I attempted to test this solution on Linux kernel version 6.5. However, I encountered an issue where the std::ifstream (glibc 2.35) causes a crash, same as https://github.com/DynamoRIO/dynamorio/issues/5763. Due to this problem, I take some time to test the solution with a different version of the glibc(glibc 2.33 and glibc 2.36).

abhinav92003 commented 8 months ago

Based on the contents of my /proc/kallsyms, I see that if we also considered the address of the lowest and highest t,w,T,W symbols, the unmapped address reported by libipt would be covered.

~Looking at the perf logic more closely, perf's implementation for copying the kcore includes the missing addr because: when determining the start addr of the module map, it considers not only the addresses in /proc/modules but also the lowest module-related symbol in /proc/kallsyms1.~

E.g., in this case, the missing IP is 0xffffffffc002bb08.

The kernel map _stext, _etext is 0xffffffffaf800000, 0xffffffffb0600000, which does not cover the addr. lowest_module_addr, highest_module_addr from /proc/modules is 0xffffffffc0225000, 0xffffffffc5242000 which also doesn't cover the addr.

~However,~ the lowest and highest module-related symbol in /proc/kallsyms (symbols that have the module name in brackets in their /proc/kallsyms entry) is ~0xffffffffaf800000~ 0xffffffffc0225000, 0xffffffffc522b338; ~this range indeed covers the missing addr.~

Our kcore_copy differs from perf also in that: we copy the specific ranges for each module4, instead of everything between the lowest and highest module symbols range3. IIRC this was intended as an optimization to reduce the size of the dumped kcore. ~Would probably need to fix this.~

@dolanzhao Let me know if you have any comments on this. It may be easier for me to patch this issue because I can actually reproduce this error on my workstation. I gather from our last discussion that your previous comment actually refers to a different issue that affects newer hardware (fixed by #6552 by updating the libipt version we use), and that your system does not reproduce this particular issue.

Edit: striked out some incorrect observations. The missing symbol is actually not covered between the lowest and highest module-related function symbols from /proc/kallsyms. That was an incorrect observation on my part. The missing symbol is covered between the lowest and highest function symbols (as also noted in a comment above). But there seem to be more complexities as even perf does not copy this range.

abhinav92003 commented 8 months ago

As a side note: perf also has this logic to copy the "entry trampolines" that we don't do4. I didn't observe any failures in our tracing because of missing this, so will just note it in a code comment for now.

abhinav92003 commented 7 months ago

I tried using DR's dumped kcore with perf script by copying it to the perf.data dir of a perf-collected trace, and perf was still able to decode its trace. I also tried using perf's dumped kcore with our libipt decode on a DR trace (by copying perf's kcore to kernel.raw in the DR trace dir), and DR's libipt decode still failed at the same missing IP.

Also, just looking at perf's kcore copy logic, I couldn't find how it dumps the missing IP additionally. As noted in the comment above, the missing IP does lie between the lowest and highest function symbols in /proc/kallsyms; but perf does not dump that region if it finds stext and etext (which are indeed present in my /proc/kallsyms).

Also, perf's kcore does not show the missing IP (0xffffffffc01c5754) in it when I use readelf:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000001000 0xffffffffb4a00000 0x0000000000000000
                 0x0000000000e00000 0x0000000000e00000  RWE    0x1000
  LOAD           0x0000000000e01000 0xffffffffc0386000 0x0000000000000000
                 0x0000000004e8a000 0x0000000004e8a000  RWE    0x1000

Now I'm thinking: even though dumping that missing IP additionally in our kcore_copy helps workaround this issue, the real difference between perf and us may not be in the kcore copy logic.

When I grep for "bpf_prog_fb33d7816e42d685_file_monitoring_open" (the perf script output says this is the symbol at the missing IP), I could find it in the perf.data/data binary, but not in perf.data/kcore_dir. Maybe perf dumps more information in the trace, which helps it decode later.

Speculation: I'm reading that eBPF code may be JIT compiled (also the symbol above has that hex number in it which seems weird), and JIT code probably doesn't have symbols in /proc/kallsyms1.

I don't know how we can reliably identify and dump such JIT regions though.

abhinav92003 commented 7 months ago

I was able to find the symbol for the missing IP in /proc/kallsyms after relaxing /proc/sys/net/core/bpf_jit_harden. One possible workaround is that we special-case bpf, and copy the /proc/kcore regions that correspond to all "[bpf]"-related symbols in /proc/kallsyms. Still curious what perf does that allows it to get the BPF JIT code symbols/encodings in its trace.

I verified that /proc/modules hasn't changed even after relaxing bpf_jit_harden.

DynamoRIO / dynamorio

libipt decode fails due to missing mapped addresses #6486