Open petervbraun opened 5 years ago
Please provide more information:
Minimization: does this problem happen without these extra arguments -trace_after_instrs
, etc.?
Does the application run correctly under plain DR (.../drrun -- ~/Grazelle...
)?
Are there diagnostic asserts when running with -debug
(.../drrun -debug -t ...
)?
The problem still occurs after removing the arguments -trace_after_instrs 1M and -exit_after_tracing 10G. The application does run correctly under plain DR. I am missing the folder build/lib64/debug so I copied that folder (lib64/debug) and also ext/lib64 from the prebuilt DynamoRIO-Linux-7.0.0-RC1.tar.gz. I receive:
pebraun@bohr3:/mnt/sdc$ ./dynamorio/build/bin64/drrun -debug -t drmemtrace -offline -- ls
<Starting application /bin/ls (24262)>
<ERROR: master_signal_handler with no siginfo (i#26?): tid=24262, sig=11>
Interestingly, running drmemtrace without debug with other executables such as ls does work.
The 7.0.0-RC1 is pretty old: better to use a cronbuild from https://github.com/DynamoRIO/dynamorio/wiki/Latest-Build.
I wonder if you could do a binary search using -trace_after_instrs
which may show precisely which block of code is causing the problem? I.e., run Grazelle with a very large -trace_after_instrs
nearly equal to the total instruction count, which I would expect to work, and then cut it in half, etc. Hopefully the failure is deterministic.
Thanks for helping Derek. Running the newest cronbuild with the debug option I have
$ ../bin64/drrun -debug -t drmemtrace -offline -trace_after_instrs 1M -- ~/Grazelle-PPoPP18/output/linux/grazelle -n 1 -i ~/Grazelle-PPoPP18/cit-Patents.bin
<Starting application /soe/pebraun/Grazelle-PPoPP18/output/linux/grazelle (3257)>
<Initial options = -no_dynamic_options -client_lib '/mnt/sdc/DynamoRIO-x86_64-Linux-7.0.17901-0/tools/../bin64/../tools/lib64/release/libdrmemtrace.so;0;"-offline" "-trace_after_instrs" "1M"' -code_api -stack_size 56K -signal_stack_size 32K -nop_initial_bblock -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Application /soe/pebraun/Grazelle-PPoPP18/output/linux/grazelle (3257). Internal Error: DynamoRIO debug check failure: /home/travis/build/DynamoRIO/dynamorio/core/unix/signal.c:517 !YMM_ENABLED() || ALIGNED(pend_unit_size, AVX_ALIGNMENT)
(Error occurred @0 frags)
version 7.0.17901, custom build
-no_dynamic_options -client_lib '/mnt/sdc/DynamoRIO-x86_64-Linux-7.0.17901-0/tools/../bin64/../tools/lib64/release/libdrmemtrace.so;0;"-offline" "-trace_after_instrs" "1M"' -code_api -stack_size 56K -signal_stack_size 32K -nop_initial_bblock -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored
0x00007fff621a4230 0x00007ff8350e8aab
0x00007fff621a4470 0x00007ff8352c59c2
0x00007fff621a4520 0x00007ff8352aedb9
0x00007fff621a45d0 0x00007ff835061997
0x00007fff621a4640 0x00007ff83505ef35
0x00007fff621a4e80 0x00007ff8352e35af
0x00007fff621a58d0 0x00007ff83529400a>
There is no error with -trace_after_instrs 2G since that is greater than the max instructions in the program. I receive the same error any time dynamorio tries to trace it (e.g. -trace_after_instrs 1.999999999999999999999999999999999999999999999G)
One thing I forgot to mention is that I use a drmemtrace script as follows:
$ cat ~/dynamorio/build/tools/drmemtrace.drrun64
DR_OP=-nop_initial_bblock
CLIENT_REL=clients/lib64/release/libdrmemtrace.so
TOOL_OP=
As an addendum, I get this error whenever I run dynamorio with debug enabled.
DynamoRIO-x86_64-Linux-7.0.17901-0/bin64/drrun -debug -t drmemtrace -offline -trace_after_instrs 16G -exit_after_tracing 10G -- ls
<Starting application /bin/ls (11085)>
<Initial options = -no_dynamic_options -client_lib '/soe/pebraun/DynamoRIO-x86_64-Linux-7.0.17901-0/bin64/../tools/lib64/release/libdrmemtrace.so;0;"-offline" "-trace_after_instrs" "16G" "-exit_after_tracing" "10G"' -code_api -stack_size 56K -signal_stack_size 32K -nop_initial_bblock -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Application /bin/ls (11085). Internal Error: DynamoRIO debug check failure: /home/travis/build/DynamoRIO/dynamorio/core/unix/signal.c:517 !YMM_ENABLED() || ALIGNED(pend_unit_size, AVX_ALIGNMENT)
(Error occurred @0 frags)
version 7.0.17901, custom build
-no_dynamic_options -client_lib '/soe/pebraun/DynamoRIO-x86_64-Linux-7.0.17901-0/bin64/../tools/lib64/release/libdrmemtrace.so;0;"-offline" "-trace_after_instrs" "16G" "-exit_after_tracing" "10G"' -code_api -stack_size 56K -signal_stack_size 32K -nop_initial_bblock -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulat
0x00007ffe8299eaf0 0x00007f5579f39aab
0x00007ffe8299ed30 0x00007f557a1169c2
0x00007ffe8299ede0 0x00007f557a0ffdb9
0x00007ffe8299ee90 0x00007f5579eb2997
0x00007ffe8299ef00 0x00007f5579eaff35
0x00007ffe8299f740 0x00007f557a1345af
0x00007ffe829a0190 0x00007f557a0e500a>
If I run it without it works fine!
DynamoRIO-x86_64-Linux-7.0.17901-0/bin64/drrun -t drmemtrace -offline -trace_after_instrs 16G -exit_after_tracing 10G -- ls
drmemtrace.ls.11086.0329.dir outputfile x264
dynamorio paris_cif.y4m XSBench
DynamoRIO-x86_64-Linux-7.0.17901-0 perf.data
So there seems to be some kind of problem with running with the debug option...
I've been able to isolate what I believe to be the error. I commented out the offending assert statement (I'll make an issue about that separately) and was able to repeat my error 100% of the time.
<Starting application /soe_bkp/pebraun/Grazelle-PPoPP18/output/linux/grazelle (6985)>
<Initial options = -no_dynamic_options -client_lib '/mnt/sdc/dynamorio/debug_build/bin64/../clients/lib64/debug/libdrmemtrace.so;0;"-trace_after_instrs" "1G" "-exit_after_tracing" "10M"' -code_api -stack_size 56K -signal_stack_size 32K -nop_initial_bblock -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/mnt/sdc/dynamorio/debug_build/bin64/../clients/lib64/debug/libdrmemtrace.so' 0x0000000072014c50
add-symbol-file '/mnt/sdc/dynamorio/debug_build/lib64/debug/libdynamorio.so' 0x00007f93647a7688
add-symbol-file '/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrsyms.so' 0x000000000001a240
add-symbol-file '/lib/x86_64-linux-gnu/libc.so.6' 0x00007f936417c2d0
add-symbol-file '/lib64/ld-linux-x86-64.so.2' 0x00007f9363f30f10
add-symbol-file '/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrwrap.so' 0x0000000074003020
add-symbol-file '/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrmgr.so' 0x0000000073003570
add-symbol-file '/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrutil.so' 0x00000000750016a0
add-symbol-file '/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrcovlib.so' 0x0000000073801db0
add-symbol-file '/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrx.so' 0x00000000770031f0
add-symbol-file '/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrreg.so' 0x0000000078002b10
add-symbol-file '/usr/lib/x86_64-linux-gnu/libstdc++.so.6' 0x00007f9363c323f0
add-symbol-file '/lib/x86_64-linux-gnu/libm.so.6' 0x00007f9363812a80
add-symbol-file '/lib/x86_64-linux-gnu/libgcc_s.so.1' 0x00007f93635f0ac0
>
<(1+x) Handling our fault in a TRY at 0x00007f93649ef7e0>
<spurious rep/repne prefix @0x00007f9364564806 (f2 41 ff e3): >
Experiments: EDGE_FORCE_PULL EDGE_PULL_WITHOUT_SCHED_AWARE
In-edges: created 5687755 vectors, efficiency = 72.6%
Vertices: assigned using in-edge list
Vertices: node 0 gets 3774768 vertices (0 to 3774767, 100.00% of total)
Scheduler: total units = 896, vectors per unit = 6347
Loading graph took 1740.01ms.
Starting execution.
Hit delay threshold: enabling tracing.
<Application /soe_bkp/pebraun/Grazelle-PPoPP18/output/linux/grazelle (6985) DynamoRIO usage error : instr_encode error: no encoding found (see log)>
<Usage error: instr_encode error: no encoding found (see log) (/mnt/sdc/dynamorio/core/arch/x86/encode.c, line 2386)
version 7.0.17939, custom build
-no_dynamic_options -client_lib '/mnt/sdc/dynamorio/debug_build/bin64/../clients/lib64/debug/libdrmemtrace.so;0;"-trace_after_instrs" "1G" "-exit_after_tracing" "10M"' -code_api -stack_size 56K -signal_stack_size 32K -nop_initial_bblock -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_sysc
0x0000000052b94560 0x00007f936484239a
0x0000000052b945b0 0x00007f93649a6000
0x0000000052b946c0 0x00007f936499b7a0
0x0000000052b94700 0x00007f9364985f0b
0x0000000052b94760 0x00007f9364987bb0
0x0000000052b94790 0x00007f936483b3d2
0x0000000052b94cc0 0x00007f9364841dbf
0x0000000052b94d10 0x00007f93649c2bda
0x0000000052b94f00 0x00007f936482ac45
0x0000000052b94ff0 0x0000000052778f0d
/mnt/sdc/dynamorio/debug_build/bin64/../clients/lib64/debug/libdrmemtrace.so=0x0000000072000000
/usr/lib/x86_64-linux-gnu/libstdc++.so.6=0x00007f9363ba6000
/lib/x86_64-linux-gnu/libgcc_s.so.1=0x00007f93635ee000
/lib/x86_64-linux-gnu/libm.so.6=0x00007f9363807000
/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrcovlib.so=0x0000000073800000
/mnt/sdc/dynamorio/debug_build/ext/lib64/debug/libdrx.so=0x0000000077000000
/mnt/sdc/dynamorio/debug_build>
As you can see the error is <Usage error: instr_encode error: no encoding found (see log) (/mnt/sdc/dynamorio/core/arch/x86/encode.c, line 2386)
Here is an excerpt from the log:
dispatch: target = 0x0000000000406386
Creating new heap unit 1 (4 [/56] KB)
CLEANCALL: insert clean call to 0x0000000072015c8f
ERROR: Could not find encoding for: lea (%r15,%ymm6,8) -> %rax
SYSLOG_ERROR: Application /soe_bkp/pebraun/Grazelle-PPoPP18/output/linux/grazelle (7043) DynamoRIO usage error : instr_encode error: no encoding found (see log)
SYSLOG_ERROR: Usage error: instr_encode error: no encoding found (see log) (/mnt/sdc/dynamorio/core/arch/x86/encode.c, line 2386)
Dynamorio crashes on this same instruction each time and ends tracing, preventing me from collecting traces. Any idea why no encoding is found for this instruction?
There is no such instruction in x86 as a lea with a ymm index register. The question is, where did it come from? The log should answer that: look at the application instructions in that block and presumably there is no such lea instruction but rather a VSIB-addressed instruction. Probably you're hitting #2985.
[For reference Grazelle is from https://github.com/stanford-mast/Grazelle-PPoPP18 and compiled with make fig13. The input dataset is from https://zenodo.org/record/1169388 -> both cit-Patents.bin-push and cit-Patents.bin-pull need to be downloaded]