DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.64k stars 560 forks source link

Crash creating offline trace for Grazelle #3347

Open petervbraun opened 5 years ago

petervbraun commented 5 years ago
pebraun$ ./dynamorio/build/bin64/drrun -t drmemtrace -offline -trace_after_instrs 1M -exit_after_tracing 10G -- ~/Grazelle-PPoPP18/output/linux/grazelle -i ~/Grazelle-PPoPP18/cit-Patents.bin
Experiments: THRESHOLD_WITHOUT_COUNT
Hit delay threshold: enabling tracing.
In-edges:  created 5687755 vectors, efficiency = 72.6%
Out-edges: created 4940362 vectors, efficiency = 83.6%
Vertices:  assigned using in-edge list
Vertices:  node 0 gets 3774768 vertices (0 to 3774767, 100.00% of total)
Scheduler: total units = 896, vectors per unit = 6347
Loading graph took 10568.08ms.
Starting execution.
<Application /soe/pebraun/Grazelle-PPoPP18/output/linux/grazelle (13868).  DynamoRIO Cache Simulator Tracer internal crash at PC 0x00007f11fec7e50c.  Please report this at http://dynamorio.org/issues.  Program aborted.
Received SIGSEGV at pc 0x00007f11fec7e50c in thread 13895
Base: 0x00007f11feba3000
Registers:eax=0x0000000000000001 ebx=0x000000004e447a68 ecx=0x0000000000000000 edx=0x0000000000000000
        esi=0x0000000000000000 edi=0x0000000000000048 esp=0x000000004de05ba0 ebp=0x0000000000000000
        r8 =0x0000000000000049 r9 =0x0000000000000000 r10=0x0000000000000001 r11=0x0000000000000006
        r12=0x0000000000000000 r13=0x0000000000000000 r14=0x0000000000000038 r15=0x00007f11fef275a8
        eflags=0x0000000000010246
version 7.0.17715, custom build
-no_dynamic_options -client_lib '/mnt/sdc/dynamorio/build/bin64/../clients/lib64/release/libdrmemtrace.so;0;"-offline" "-trace_after_instrs" "1M" "-exit_after_tracing" "10G"' -code_api -stack_size 56K -signal_stack_size 32K -nop_initial_bblock -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignor
0x00007f11fecdc6f8 0x3025783020435020>

[For reference Grazelle is from https://github.com/stanford-mast/Grazelle-PPoPP18 and compiled with make fig13. The input dataset is from https://zenodo.org/record/1169388 -> both cit-Patents.bin-push and cit-Patents.bin-pull need to be downloaded]

derekbruening commented 5 years ago

Please provide more information:

petervbraun commented 5 years ago

The problem still occurs after removing the arguments -trace_after_instrs 1M and -exit_after_tracing 10G. The application does run correctly under plain DR. I am missing the folder build/lib64/debug so I copied that folder (lib64/debug) and also ext/lib64 from the prebuilt DynamoRIO-Linux-7.0.0-RC1.tar.gz. I receive:

pebraun@bohr3:/mnt/sdc$ ./dynamorio/build/bin64/drrun -debug -t drmemtrace -offline -- ls
<Starting application /bin/ls (24262)>
<ERROR: master_signal_handler with no siginfo (i#26?): tid=24262, sig=11>

Interestingly, running drmemtrace without debug with other executables such as ls does work.

derekbruening commented 5 years ago

The 7.0.0-RC1 is pretty old: better to use a cronbuild from https://github.com/DynamoRIO/dynamorio/wiki/Latest-Build.

I wonder if you could do a binary search using -trace_after_instrs which may show precisely which block of code is causing the problem? I.e., run Grazelle with a very large -trace_after_instrs nearly equal to the total instruction count, which I would expect to work, and then cut it in half, etc. Hopefully the failure is deterministic.

petervbraun commented 5 years ago

Thanks for helping Derek. Running the newest cronbuild with the debug option I have

$ ../bin64/drrun -debug -t drmemtrace -offline -trace_after_instrs 1M -- ~/Grazelle-PPoPP18/output/linux/grazelle -n 1 -i ~/Grazelle-PPoPP18/cit-Patents.bin
<Starting application /soe/pebraun/Grazelle-PPoPP18/output/linux/grazelle (3257)>
<Initial options = -no_dynamic_options -client_lib '/mnt/sdc/DynamoRIO-x86_64-Linux-7.0.17901-0/tools/../bin64/../tools/lib64/release/libdrmemtrace.so;0;"-offline" "-trace_after_instrs" "1M"' -code_api -stack_size 56K -signal_stack_size 32K -nop_initial_bblock -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Application /soe/pebraun/Grazelle-PPoPP18/output/linux/grazelle (3257).  Internal Error: DynamoRIO debug check failure: /home/travis/build/DynamoRIO/dynamorio/core/unix/signal.c:517 !YMM_ENABLED() || ALIGNED(pend_unit_size, AVX_ALIGNMENT)
(Error occurred @0 frags)
version 7.0.17901, custom build
-no_dynamic_options -client_lib '/mnt/sdc/DynamoRIO-x86_64-Linux-7.0.17901-0/tools/../bin64/../tools/lib64/release/libdrmemtrace.so;0;"-offline" "-trace_after_instrs" "1M"' -code_api -stack_size 56K -signal_stack_size 32K -nop_initial_bblock -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored
0x00007fff621a4230 0x00007ff8350e8aab
0x00007fff621a4470 0x00007ff8352c59c2
0x00007fff621a4520 0x00007ff8352aedb9
0x00007fff621a45d0 0x00007ff835061997
0x00007fff621a4640 0x00007ff83505ef35
0x00007fff621a4e80 0x00007ff8352e35af
0x00007fff621a58d0 0x00007ff83529400a>

There is no error with -trace_after_instrs 2G since that is greater than the max instructions in the program. I receive the same error any time dynamorio tries to trace it (e.g. -trace_after_instrs 1.999999999999999999999999999999999999999999999G)

One thing I forgot to mention is that I use a drmemtrace script as follows:

$ cat ~/dynamorio/build/tools/drmemtrace.drrun64
DR_OP=-nop_initial_bblock
CLIENT_REL=clients/lib64/release/libdrmemtrace.so
TOOL_OP=
petervbraun commented 5 years ago

As an addendum, I get this error whenever I run dynamorio with debug enabled.

DynamoRIO-x86_64-Linux-7.0.17901-0/bin64/drrun -debug -t drmemtrace -offline -trace_after_instrs 16G -exit_after_tracing 10G -- ls
<Starting application /bin/ls (11085)>
<Initial options = -no_dynamic_options -client_lib '/soe/pebraun/DynamoRIO-x86_64-Linux-7.0.17901-0/bin64/../tools/lib64/release/libdrmemtrace.so;0;"-offline" "-trace_after_instrs" "16G" "-exit_after_tracing" "10G"' -code_api -stack_size 56K -signal_stack_size 32K -nop_initial_bblock -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Application /bin/ls (11085).  Internal Error: DynamoRIO debug check failure: /home/travis/build/DynamoRIO/dynamorio/core/unix/signal.c:517 !YMM_ENABLED() || ALIGNED(pend_unit_size, AVX_ALIGNMENT)
(Error occurred @0 frags)
version 7.0.17901, custom build
-no_dynamic_options -client_lib '/soe/pebraun/DynamoRIO-x86_64-Linux-7.0.17901-0/bin64/../tools/lib64/release/libdrmemtrace.so;0;"-offline" "-trace_after_instrs" "16G" "-exit_after_tracing" "10G"' -code_api -stack_size 56K -signal_stack_size 32K -nop_initial_bblock -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulat
0x00007ffe8299eaf0 0x00007f5579f39aab
0x00007ffe8299ed30 0x00007f557a1169c2
0x00007ffe8299ede0 0x00007f557a0ffdb9
0x00007ffe8299ee90 0x00007f5579eb2997
0x00007ffe8299ef00 0x00007f5579eaff35
0x00007ffe8299f740 0x00007f557a1345af
0x00007ffe829a0190 0x00007f557a0e500a>

If I run it without it works fine!

DynamoRIO-x86_64-Linux-7.0.17901-0/bin64/drrun -t drmemtrace -offline -trace_after_instrs 16G -exit_after_tracing 10G -- ls
drmemtrace.ls.11086.0329.dir        outputfile     x264
dynamorio                           paris_cif.y4m  XSBench
DynamoRIO-x86_64-Linux-7.0.17901-0  perf.data

So there seems to be some kind of problem with running with the debug option...

petervbraun commented 5 years ago

I've been able to isolate what I believe to be the error. I commented out the offending assert statement (I'll make an issue about that separately) and was able to repeat my error 100% of the time.

<Starting application /soe_bkp/pebraun/Grazelle-PPoPP18/output/linux/grazelle (6985)>
<Initial options = -no_dynamic_options -client_lib '/mnt/sdc/dynamorio/debug_build/bin64/../clients/lib64/debug/libdrmemtrace.so;0;"-trace_after_instrs" "1G" "-exit_after_tracing" "10M"' -code_api -stack_size 56K -signal_stack_size 32K -nop_initial_bblock -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/mnt/sdc/dynamorio/debug_build/bin64/../clients/lib64/debug/libdrmemtrace.so' 0x0000000072014c50
add-symbol-file '/mnt/sdc/dynamorio/debug_build/lib64/debug/libdynamorio.so' 0x00007f93647a7688
add-symbol-file '/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrsyms.so' 0x000000000001a240
add-symbol-file '/lib/x86_64-linux-gnu/libc.so.6' 0x00007f936417c2d0
add-symbol-file '/lib64/ld-linux-x86-64.so.2' 0x00007f9363f30f10
add-symbol-file '/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrwrap.so' 0x0000000074003020
add-symbol-file '/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrmgr.so' 0x0000000073003570
add-symbol-file '/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrutil.so' 0x00000000750016a0
add-symbol-file '/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrcovlib.so' 0x0000000073801db0
add-symbol-file '/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrx.so' 0x00000000770031f0
add-symbol-file '/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrreg.so' 0x0000000078002b10
add-symbol-file '/usr/lib/x86_64-linux-gnu/libstdc++.so.6' 0x00007f9363c323f0
add-symbol-file '/lib/x86_64-linux-gnu/libm.so.6' 0x00007f9363812a80
add-symbol-file '/lib/x86_64-linux-gnu/libgcc_s.so.1' 0x00007f93635f0ac0
>
<(1+x) Handling our fault in a TRY at 0x00007f93649ef7e0>
<spurious rep/repne prefix @0x00007f9364564806 (f2 41 ff e3): >
Experiments: EDGE_FORCE_PULL EDGE_PULL_WITHOUT_SCHED_AWARE
In-edges:  created 5687755 vectors, efficiency = 72.6%
Vertices:  assigned using in-edge list
Vertices:  node 0 gets 3774768 vertices (0 to 3774767, 100.00% of total)
Scheduler: total units = 896, vectors per unit = 6347
Loading graph took 1740.01ms.
Starting execution.
Hit delay threshold: enabling tracing.
<Application /soe_bkp/pebraun/Grazelle-PPoPP18/output/linux/grazelle (6985) DynamoRIO usage error : instr_encode error: no encoding found (see log)>
<Usage error: instr_encode error: no encoding found (see log) (/mnt/sdc/dynamorio/core/arch/x86/encode.c, line 2386)
version 7.0.17939, custom build
-no_dynamic_options -client_lib '/mnt/sdc/dynamorio/debug_build/bin64/../clients/lib64/debug/libdrmemtrace.so;0;"-trace_after_instrs" "1G" "-exit_after_tracing" "10M"' -code_api -stack_size 56K -signal_stack_size 32K -nop_initial_bblock -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_sysc
0x0000000052b94560 0x00007f936484239a
0x0000000052b945b0 0x00007f93649a6000
0x0000000052b946c0 0x00007f936499b7a0
0x0000000052b94700 0x00007f9364985f0b
0x0000000052b94760 0x00007f9364987bb0
0x0000000052b94790 0x00007f936483b3d2
0x0000000052b94cc0 0x00007f9364841dbf
0x0000000052b94d10 0x00007f93649c2bda
0x0000000052b94f00 0x00007f936482ac45
0x0000000052b94ff0 0x0000000052778f0d
/mnt/sdc/dynamorio/debug_build/bin64/../clients/lib64/debug/libdrmemtrace.so=0x0000000072000000
/usr/lib/x86_64-linux-gnu/libstdc++.so.6=0x00007f9363ba6000
/lib/x86_64-linux-gnu/libgcc_s.so.1=0x00007f93635ee000
/lib/x86_64-linux-gnu/libm.so.6=0x00007f9363807000
/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrcovlib.so=0x0000000073800000
/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrx.so=0x0000000077000000
/mnt/sdc/dynamorio/debug_build>

As you can see the error is <Usage error: instr_encode error: no encoding found (see log) (/mnt/sdc/dynamorio/core/arch/x86/encode.c, line 2386) Here is an excerpt from the log:

dispatch: target = 0x0000000000406386
        Creating new heap unit 1 (4 [/56] KB)
CLEANCALL: insert clean call to 0x0000000072015c8f
ERROR: Could not find encoding for: lea    (%r15,%ymm6,8) -> %rax
SYSLOG_ERROR: Application /soe_bkp/pebraun/Grazelle-PPoPP18/output/linux/grazelle (7043) DynamoRIO usage error : instr_encode error: no encoding found (see log)
SYSLOG_ERROR: Usage error: instr_encode error: no encoding found (see log) (/mnt/sdc/dynamorio/core/arch/x86/encode.c, line 2386)

Dynamorio crashes on this same instruction each time and ends tracing, preventing me from collecting traces. Any idea why no encoding is found for this instruction?

derekbruening commented 5 years ago

There is no such instruction in x86 as a lea with a ymm index register. The question is, where did it come from? The log should answer that: look at the application instructions in that block and presumably there is no such lea instruction but rather a VSIB-addressed instruction. Probably you're hitting #2985.