[jdk8] SPECjvm 2008 tests won't run

DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform

Other

2.62k stars 557 forks source link

[jdk8] SPECjvm 2008 tests won't run #3733

Open rkgithubs opened 5 years ago

rkgithubs commented 5 years ago

we are seeing that SPECjvm 2008 runs won't even start the warm-up phase when launched with drrun. Typically specjvm runs may look like this:

/home/rahul/jdk1.8.0_201/bin/java -jar SPECjvm2008.jar -ikv -wt 15 -it 30 -bt 2 scimark.sparse.small

SPECjvm2008 Peak
  Properties file:   none
  Benchmarks:        scimark.sparse.small

with drrun we never get to this first message. I do see two threads running for short period but not convinced runs is successful since it never gets to warm-up and execution phase of the test. Although memory utilization is roughly 11GB which is quite high for sparse.small

/root/rahul/DynamoRIO-x86_64-Linux-7.90.18019-0/bin64/drrun -s 60 -debug -loglevel 3 -vm_size 1G -no_enable_reset -disable_traces -- ~/rahul/jdk1.8.0_201/bin/java -jar SPECjvm2008.jar -ikv -wt 15 -it 30 -bt 2 scimark.sparse.small

<log dir=/root/rahul/DynamoRIO-x86_64-Linux-7.90.18019-0/bin64/../logs/java.59563.00000000>

<Starting application /root/rahul/jdk1.8.0_201/bin/java (59563)>
<Initial options = -no_dynamic_options -loglevel 3 -code_api -stack_size 56K -signal_stack_size 32K -disable_traces -no_enable_traces -max_elide_jmp 0 -max_elide_call 0 -no_shared_traces -bb_ibl_targets -no_shared_trace_ibl_routine -no_enable_reset -no_reset_at_switch_to_os_at_vmm_limit -reset_at_vmm_percent_free_limit 0 -no_reset_at_vmm_full -reset_at_commit_free_limit 0K -reset_every_nth_pending 0 -vm_size 1048576K -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/root/rahul/DynamoRIO-x86_64-Linux-7.90.18019-0/lib64/debug/libdynamorio.so' 0x00007f2e11bd7580
>
<curiosity: rex.w on OPSZ_6_irex10_short4!>
<(1+x) Handling our fault in a TRY at 0x00007f2e11e20d7c>
<spurious rep/repne prefix @0x00007f2e11994f96 (f2 41 ff e3): >
<writing to executable region.>
<get_memory_info mismatch! (can happen if os combines entries in /proc/pid/maps)
        os says: 0x00000000491dc000-0x0000000089042000 prot=0x00000000
        cache says: 0x000000004904e000-0x0000000089042000 prot=0x00000000
>

attached log debuglevel 3 for the java pid java.log.zip

java.0.59824.zip

derekbruening commented 2 years ago

if (instr_is_exclusive_store(instr)) count++;

It is possible to have non-strictly-paired one-load-to-one-store, and to have dynamic paths that only execute one and not the other (and DR does try to handle all that): but I assumed real app code would never do that. Do you have an example of code that does not dynamically execute strict ldex-stex pairs? Or maybe it's a race with another thread coming in for its ldex in between the ldex-stex of the first thread.

kuhanov commented 2 years ago

Added dumping all ilists in mangle_exclusive_monitor_op. Got another hang when a few cpus are occupied by threads on 100%. in this case, the last records in log

TAG  0x0000fff921ea44c8
 +0    L3 @0x0000fff921ea6530  52800020   movz   $0x0001 lsl $0x00 -> %w0
 +4    L3 @0x0000fff921ea44c8  885ffe81   ldaxr  (%x20)[4byte] -> %w1
 +8    L3 @0x0000fff921ea4298  7100003f   subs   %w1 $0x0000 lsl $0x00 -> %wzr
 +12   L3 @0x0000fff921ea5a88  54000061   b.ne   $0x0000ffffa5515410
 +16   L4 @0x0000fff921ea76b0  14000000   b      $0x0000ffffa5515408
END 0x0000fff921ea44c8

COUNT = 145 load_store = 5 
TAG  0x0000fff921ea5a88
 +0    L3 @0x0000fff921ea76b0  52800020   movz   $0x0001 lsl $0x00 -> %w0
 +4    L3 @0x0000fff921ea5a88  885ffe81   ldaxr  (%x20)[4byte] -> %w1
 +8    L3 @0x0000fff921ea4298  7100003f   subs   %w1 $0x0000 lsl $0x00 -> %wzr
 +12   L3 @0x0000fff921ea7930  54000061   b.ne   $0x0000ffffa5515410
 +16   L4 @0x0000fff921fb6900  14000000   b      $0x0000ffffa5515408
END 0x0000fff921ea5a88

Not sure is it possible?

Trying to catch original issue with hanging on futexes. Kirill

kuhanov commented 2 years ago

Hi @derekbruening . Looks like this is hard to find the count of mangling instructions. Sometimes the hang is happened when we mangel 150, sometimes - 250. I have workload that could be find 99 times from 100 and one run is fail. I printed all blocks from mangle_exclusive_monitor_op and caught that hang happened when we have the same blocks in output. All good runs didn't have repeated bb. for example, fail log includes

121 TAG  0x0000fff9293b8108
 +0    L3 @0x0000fff9293b8108  88027e60   stxr   %w0 -> (%x19)[4byte] %w2
 +4    L3 @0x0000fff9293b8308  35ffff82   cbnz   $0x0000ffffaceeed44 %w2
 +8    L4 @0x0000fff9293b7b88  14000000   b      $0x0000ffffaceeed58
END 0x0000fff9293b8108

289 TAG  0x0000fff929a116b0
 +0    L3 @0x0000fff929a116b0  88027e60   stxr   %w0 -> (%x19)[4byte] %w2
 +4    L3 @0x0000fff929a10cf0  35ffff82   cbnz   $0x0000ffffaceeed44 %w2
 +8    L4 @0x0000fff92a3cd900  14000000   b      $0x0000ffffaceeed58
END 0x0000fff929a116b0

290 TAG  0x0000fff92a3cd900
 +0    L3 @0x0000fff92a3cd900  88027e60   stxr   %w0 -> (%x19)[4byte] %w2
 +4    L3 @0x0000fff929a10cf0  35ffff82   cbnz   $0x0000ffffaceeed44 %w2
 +8    L4 @0x0000fff929a10530  14000000   b      $0x0000ffffaceeed58
END 0x0000fff92a3cd900

one more example

187 TAG  0x0000fff8fdc41270
 +0    L3 @0x0000fff8fdc41270  c804fc02   stlxr  %x2 -> (%x0)[8byte] %w4
 +4    L3 @0x0000fff8fdc40a30  35ffff84   cbnz   $0x0000ffff8054a118 %w4
 +8    L4 @0x0000fff8fdc42a30  14000000   b      $0x0000ffff8054a12c
END 0x0000fff8fdc41270

278 TAG  0x0000fff8fde13ad0
 +0    L3 @0x0000fff8fde13ad0  c804fc02   stlxr  %x2 -> (%x0)[8byte] %w4
 +4    L3 @0x0000fff8fde13c90  35ffff84   cbnz   $0x0000ffff8054a118 %w4
 +8    L4 @0x0000fff8fde13810  14000000   b      $0x0000ffff8054a12c
END 0x0000fff8fde13ad0

279 TAG  0x0000fff8fde13810
 +0    L3 @0x0000fff8fde13810  c804fc02   stlxr  %x2 -> (%x0)[8byte] %w4
 +4    L3 @0x0000fff8fde13c90  35ffff84   cbnz   $0x0000ffff8054a118 %w4
 +8    L4 @0x0000fff8fdc421f0  14000000   b      $0x0000ffff8054a12c
END 0x0000fff8fde13810

Thx, Kirill

derekbruening commented 2 years ago

By themselves those don't look unusual: no XZR, no stolen register x28. You can see our tests here to see if anything is missing: https://github.com/DynamoRIO/dynamorio/blob/master/suite/tests/client-interface/ldstex.c#L278

Does the corresponding ldxr look unusual?

The harder question to answer is whether this code sequence requires the monitor rather than the compare-and-swap we turn it into. See https://dynamorio.org/page_ldstex.html#autotoc_md195. Ideally it would be tracked back to the Java source code to help figure that out.

kuhanov commented 2 years ago

strange thing is that good runs does not have such patterns when we have 3 times mangling for the similar blocks. :( but ALL my fails have such: the blocks could be different, could be load, could be store (a couple previous my comments) I'm not clear why are they called? and why there are somehing like this in good runs

kuhanov commented 2 years ago

Does the corresponding ldxr look unusual? Hi, @derekbruening It is absent at all how it looks in good case


COUNT = 91 tid = 92215 load_store = 4.
TAG  0x0000fff925bf7208
+0    L3 @0x0000fff925bf8cc8  d10083ff   sub    %sp $0x0020 lsl $0x00 -> %sp
+4    L3 @0x0000fff925bf9bf0  f9000fe0   str    %x0 -> +0x18(%sp)[8byte]
+8    L3 @0x0000fff925bf7570  f9000be1   str    %x1 -> +0x10(%sp)[8byte]
+12   L3 @0x0000fff925bf6e70  f90007e2   str    %x2 -> +0x08(%sp)[8byte]
+16   L3 @0x0000fff925bf7308  f94007e1   ldr    +0x08(%sp)[8byte] -> %x1
+20   L3 @0x0000fff925bf7d30  f9400fe2   ldr    +0x18(%sp)[8byte] -> %x2
+24   L3 @0x0000fff925bf9370  f9400be0   ldr    +0x10(%sp)[8byte] -> %x0
+28   L3 @0x0000fff925bf7208  c85f7c03   ldxr   (%x0)[8byte] -> %x3
+32   L3 @0x0000fff925bf9a30  eb01007f   subs   %x3 %x1 lsl $0x00 -> %xzr
+36   L3 @0x0000fff925bf7a30  54000061   b.ne   $0x0000ffffa850112c
+40   L4 @**0x0000fff925bf8270**  14000000   b      $0x0000ffffa8501124
END 0x0000fff925bf7208

COUNT = 92 tid = 92215 load_store = 3 STORE TAG 0x0000fff925bf8270 +0 L3 @0x0000fff925bf8270 c804fc02 stlxr %x2 -> (%x0)[8byte] %w4 +4 L3 @0x0000fff925bf7a30 35ffff84 cbnz $0x0000ffffa8501118 %w4 +8 L4 @0x0000fff925bf9a30 14000000 b $0x0000ffffa850112c END 0x0000fff925bf8270

but for wrong thread this is 1st bb often
for example for tid=**92486**

COUNT = 139 tid = 92486 load_store = 2 STORE TAG 0x0000fff925dc8a40 +0 L3 @0x0000fff925dc8a40 c804fc02 stlxr %x2 -> (%x0)[8byte] %w4 +4 L3 @0x0000fff925bf8970 35ffff84 cbnz $0x0000ffffa8501118 %w4 +8 L4 @0x0000fff925bf8cc8 14000000 b $0x0000ffffa850112c END 0x0000fff925dc8a40

COUNT = 140 tid = 92486 load_store = 1 STORE TAG 0x0000fff925bf8cc8 +0 L3 @0x0000fff925bf8cc8 c804fc02 stlxr %x2 -> (%x0)[8byte] %w4 +4 L3 @0x0000fff925bf8970 35ffff84 cbnz $0x0000ffffa8501118 %w4 +8 L4 @0x0000fff925bfa6e8 14000000 b $0x0000ffffa850112c END 0x0000fff925bf8cc8



Another strange thing for me is that I think that we must have correct order for load-store per thread. (load1-store1, load2-store2 an so on) but logs could include a few loads without store or, as we have on hang, store without load.

Maybe DRIO missed needed blocks on some reason
Kirill

kuhanov commented 2 years ago

Run another experiment here. I mangled exclusive load-store in case they are both inside the same bb. So, I missed load or store if it is without pair like

 +0    L3 @0x0000fff925bf8cc8  c804fc02   stlxr  %x2 -> (%x0)[8byte] %w4
 +4    L3 @0x0000fff925bf8970  35ffff84   cbnz   $0x0000ffffa8501118 %w4
 +8    L4 @0x0000fff925bfa6e8  14000000   b      $0x0000ffffa850112c

Maybe we need skip splitting on branch call inside monitor regions. Kirill

derekbruening commented 2 years ago

Run another experiment here. I mangled exclusive load-store in case they are both inside the same bb. So, I missed load or store if it is without pai

What was the result of the experiment?

Unpaired loads and unpaired stores are in the regression test: https://github.com/DynamoRIO/dynamorio/blob/master/suite/tests/client-interface/ldstex.c#L625

The sample expansion at https://dynamorio.org/page_ldstex.html#autotoc_md195 makes it easier to think about: you can see it compares the address and size with the values stored in TLS slots by the load. So if there's no load, those are likely to fail. But it does seem there is a chance the slot values could happen to match (the size could easily match if a prior pair set it; the address is much less likely but possible), in which case the unpaired store would succeed as though it had acquired the monitor. Looks like the test code has a clrex before the unpaired stores, so that guarantees they will all fail, so this case is not in the test. So that is one issue: an unpaired store might succeed under DR but fail natively. Your experiment will say whether this matters for this app.

kuhanov commented 2 years ago

What was the result of the experiment?

No hangs in this case.

derekbruening commented 2 years ago

What was the result of the experiment?

No hangs in this case.

So your theory is that in some runs the app executes unpaired stores with addresses that happen to match a prior load-store pair and thus hits the case I described where the unpaired store succeeds under DR and that causes the app to somehow hang?

kuhanov commented 2 years ago

thus hits the case I described where the unpaired store succeeds something like this.

I patched DRIO again and prohibited splitting bb by the branch if it was inside monitor region. for example, bb looks like

TAG  0x0000fff9211aaa08
 +0    L3 @0x0000fff9211acb78  d10083ff   sub    %sp $0x0020 lsl $0x00 -> %sp
 +4    L3 @0x0000fff9211abf18  f9000fe0   str    %x0 -> +0x18(%sp)[8byte]
 +8    L3 @0x0000fff9211acbe0  f9000be1   str    %x1 -> +0x10(%sp)[8byte]
 +12   L3 @0x0000fff9211ab570  f90007e2   str    %x2 -> +0x08(%sp)[8byte]
 +16   L3 @0x0000fff9211ad290  f94007e1   ldr    +0x08(%sp)[8byte] -> %x1
 +20   L3 @0x0000fff9211ab2b0  f9400fe2   ldr    +0x18(%sp)[8byte] -> %x2
 +24   L3 @0x0000fff9211ab0c0  f9400be0   ldr    +0x10(%sp)[8byte] -> %x0
 +28   L3 @0x0000fff9211aaa08  c85f7c03   ldxr   (%x0)[8byte] -> %x3
 +32   L3 @0x0000fff9211ab318  eb01007f   subs   %x3 %x1 lsl $0x00 -> %xzr
 +36   L3 @0x0000fff9211ad010  54000061   b.ne   $0x0000ffffa26ba12c
 +40   L3 @0x0000fff9211ab4a8  c804fc02   stlxr  %x2 -> (%x0)[8byte] %w4
 +44   L3 @0x0000fff9211ad610  35ffff84   cbnz   $0x0000ffffa26ba118 %w4
 +48   L4 @0x0000fff9211ad6d0  14000000   b      $0x0000ffffa26ba12c
END 0x0000fff9211aaa08

And again I've got fail when one thread mangle bb with load-store pair

COUNT = 141 tid = 83036 load_store = 1.
TAG  0x0000fff9211ac0b0
 +0    L3 @0x0000fff9211ac0b0  c85f7c03   ldxr   (%x0)[8byte] -> %x3
 +4    L3 @0x0000fff9211abdd8  eb01007f   subs   %x3 %x1 lsl $0x00 -> %xzr
 +8    L3 @0x0000fff9211ab7f0  54000061   b.ne   $0x0000ffffa26ba12c
 +12   L3 @0x0000fff9211ab4a8  c804fc02   stlxr  %x2 -> (%x0)[8byte] %w4
 +16   L3 @0x0000fff9211ac030  35ffff84   cbnz   $0x0000ffffa26ba118 %w4
 +20   L4 @0x0000fff9211aac08  14000000   b      $0x0000ffffa26ba12c
END 0x0000fff9211ac0b0

And got the only store bb in the next time

COUNT = 151 tid = 83036 load_store = -1 SKIP STORE
TAG  0x0000fff9211ab318
 +0    L3 @0x0000fff9211ab318  c804fc02   stlxr  %x2 -> (%x0)[8byte] %w4
 +4    L3 @0x0000fff9211ad010  35ffff84   cbnz   $0x0000ffffa26ba118 %w4
 +8    L4 @0x0000fff9211ab570  14000000   b      $0x0000ffffa26ba12c
END 0x0000fff9211ab318

It is not clear for me. All bb have load-store pairs but the last one just store. How is it possible?

In good runs there is no such bb with store only

full dump of bb-s java-full-fail.txt Kirill

kuhanov commented 2 years ago

Recheck my logs. Some good runs also includes such regions with store only but this is 1st bb for this threads or previous one includes another store and branch adresses

COUNT = 120 tid = 70076 load_store = 0 STORE
TAG  0x0000fff930d9e048
 +0    L4 @0x0000000000000000  c8dffc43   ldar   (%x2)[8byte] -> %x3
 +4    m4 @0x0000000000000000  c8dffc43   <label>
 +4    L3 @0x0000ffffb48d002c  eb00007f   subs   %x3 %x0 lsl $0x00 -> %xzr
 +8    L3 @0x0000ffffb48d0030  54000061   b.ne   $0x0000ffffb48d003c
 +12   L3 @0x0000ffffb48d0034  c8057c41   stxr   %x1 -> (%x2)[8byte] %w5
 +16   L3 @0x0000ffffb48d0038  35ffff85   cbnz   $0x0000ffffb48d0028 %w5
 +20   L4 @0x0000000000000000  14000000   b      $0x0000ffffb48d003c
END 0x0000fff930d9e048

COUNT = 126 tid = 70076 load_store = -1 SKIP STORE
TAG  0x0000fff930f730c0
 +0    L3 @0x0000ffffb36afee4  8803fc02   stlxr  %w2 -> (%x0)[4byte] %w3
 +4    L3 @0x0000ffffb36afee8  35ffffa3   cbnz   $0x0000ffffb36afedc %w3
 +8    L4 @0x0000000000000000  14000000   b      $0x0000ffffb36afeec
END 0x0000fff930f730c0

Looks like hang happened when we have the same store like we have in previous bb

kuhanov commented 2 years ago

Hi, @derekbruening. Is it possible to have store without load in real application? or is this some dynamorio code modification? Maybe you have idea where could be issue? what could I look at? Thx, Kirill

derekbruening commented 2 years ago

Is it possible to have store without load in real application? or is this some dynamorio code modification? Maybe you have idea where could be issue? what could I look at? Thx, Kirill

I would first determine whether this is really a store without a load in the dynamic instruction stream. The static code could have 2 stores sharing one load, something like:

start:
  ldxr
  cmp
  b.ne storeB
storeA:
  stxr
  jmp done
storeB:
  stxr
done:

Just seeing DR build a block for storeB doesn't mean that thread didn't just execute the block for start which was built a long time ago by another thread.

Maybe the instrace_simple client is the easiest way to answer this, getting a dynamic instruction trace. Or augment DR's mangling (or add a client that does this) to clear a TLS slot after a store to know whether a load was executed by looking for a non-cleared slot.

Also, this example from above:

 +0    L4 @0x0000000000000000  c8dffc43   ldar   (%x2)[8byte] -> %x3
 +4    m4 @0x0000000000000000  c8dffc43   <label>
 +4    L3 @0x0000ffffb48d002c  eb00007f   subs   %x3 %x0 lsl $0x00 -> %xzr
 +8    L3 @0x0000ffffb48d0030  54000061   b.ne   $0x0000ffffb48d003c
 +12   L3 @0x0000ffffb48d0034  c8057c41   stxr   %x1 -> (%x2)[8byte] %w5

Looks wrong too: ldar does not acquire a monitor, so the stxr will always fail. It's just like a plain stxr with no exclusive load in that sense. Is there an exclusive load (with an 'x' in the opcode) right before that ldar?

kuhanov commented 2 years ago

Is there an exclusive load (with an 'x' in the opcode) right before that ldar?

Sure, this is ilist after we mangle ldaxr but it was before stxr mangling. original was like this

COUNT = 120 tid = 70076 load_store = 0 STORE
TAG  0x0000fff930d9e048
 +0    L3 @0x0000ffffb48d002c8 c8dffc43   ldaxr   (%x2)[8byte] -> %x3
 +4    L3 @0x0000ffffb48d002c  eb00007f   subs   %x3 %x0 lsl $0x00 -> %xzr
 +8    L3 @0x0000ffffb48d0030  54000061   b.ne   $0x0000ffffb48d003c
 +12   L3 @0x0000ffffb48d0034  c8057c41   stxr   %x1 -> (%x2)[8byte] %w5
 +16   L3 @0x0000ffffb48d0038  35ffff85   cbnz   $0x0000ffffb48d0028 %w5
 +20   L4 @0x0000000000000000  14000000   b      $0x0000ffffb48d003c
END 0x0000fff930d9e048

kuhanov commented 2 years ago

example when we have hang

TAG  0x0000fff91c885950
 +0    L3 @0x0000ffff9f189fa0  88027e60   stxr   %w0 -> (%x19)[4byte] %w2
 +4    L3 @0x0000ffff9f189fa4  35ffff82   cbnz   $0x0000ffff9f189f94 %w2
 +8    L4 @0x0000000000000000  14000000   b      $0x0000ffff9f189fa8
END 0x0000fff91c885950

oroginal code is pthread_mutex_lock

(gdb) x /64i (0x0000ffff9f189f70-16)
   0xffff9f189f60 <pthread_mutex_lock>: add     x3, x0, #0x10
   0xffff9f189f64 <pthread_mutex_lock+4>:       ldr     w2, [x3]
   0xffff9f189f68 <pthread_mutex_lock+8>:       mov     w1, #0x17f                      // #383
   0xffff9f189f6c <pthread_mutex_lock+12>:      and     w1, w2, w1
   0xffff9f189f70 <pthread_mutex_lock+16>:      nop
   0xffff9f189f74 <pthread_mutex_lock+20>:      tst     w2, #0x7c
   0xffff9f189f78 <pthread_mutex_lock+24>:      b.ne    0xffff9f18a038 <pthread_mutex_lock+216>  // b.any
   0xffff9f189f7c <pthread_mutex_lock+28>:      stp     x29, x30, [sp, #-32]!
   0xffff9f189f80 <pthread_mutex_lock+32>:      mov     x29, sp
   0xffff9f189f84 <pthread_mutex_lock+36>:      stp     x19, x20, [sp, #16]
   0xffff9f189f88 <pthread_mutex_lock+40>:      mov     x19, x0
   0xffff9f189f8c <pthread_mutex_lock+44>:      cbnz    w1, 0xffff9f189fe0 <pthread_mutex_lock+128>
   0xffff9f189f90 <pthread_mutex_lock+48>:      mov     w0, #0x1                        // #1
   0xffff9f189f94 <pthread_mutex_lock+52>:      ldaxr   w1, [x19]
   0xffff9f189f98 <pthread_mutex_lock+56>:      cmp     w1, #0x0
   0xffff9f189f9c <pthread_mutex_lock+60>:      b.ne    0xffff9f189fa8 <pthread_mutex_lock+72>  // b.any
   0xffff9f189fa0 <pthread_mutex_lock+64>:      stxr    w2, w0, [x19]
   0xffff9f189fa4 <pthread_mutex_lock+68>:      cbnz    w2, 0xffff9f189f94 <pthread_mutex_lock+52>
   0xffff9f189fa8 <pthread_mutex_lock+72>:      b.ne    0xffff9f18a040 <pthread_mutex_lock+224>  // b.any
   0xffff9f189fac <pthread_mutex_lock+76>:      ldr     w0, [x19, #8]
   0xffff9f189fb0 <pthread_mutex_lock+80>:      cbnz    w0, 0xffff9f18a180 <pthread_mutex_lock+544>
   0xffff9f189fb4 <pthread_mutex_lock+84>:      mrs     x20, tpidr_el0
   0xffff9f189fb8 <pthread_mutex_lock+88>:      sub     x20, x20, #0x800
   0xffff9f189fbc <pthread_mutex_lock+92>:      ldr     w0, [x19, #12]
   0xffff9f189fc0 <pthread_mutex_lock+96>:      ldr     w1, [x20, #464]
   0xffff9f189fc4 <pthread_mutex_lock+100>:     add     w0, w0, #0x1
   0xffff9f189fc8 <pthread_mutex_lock+104>:     stp     w1, w0, [x19, #8]
   0xffff9f189fcc <pthread_mutex_lock+108>:     nop

'good' bb looks like (I prohibitted branch splitting inside monitor region)

TAG  0x0000fff916774d08
 +0    L3 @0x0000ffff9a4abcb4  52800020   movz   $0x0001 lsl $0x00 -> %w0
 +4    L3 @0x0000ffff9a4abcb8  885ffe61   ldaxr  (%x19)[4byte] -> %w1
 +8    L3 @0x0000ffff9a4abcbc  7100003f   subs   %w1 $0x0000 lsl $0x00 -> %wzr
 +12   L3 @0x0000ffff9a4abcc0  54000061   b.ne   $0x0000ffff9a4abccc
 +16   L3 @0x0000ffff9a4abcc4  88027e60   stxr   %w0 -> (%x19)[4byte] %w2
 +20   L3 @0x0000ffff9a4abcc8  35ffff82   cbnz   $0x0000ffff9a4abcb8 %w2
 +24   L4 @0x0000000000000000  14000000   b      $0x0000ffff9a4abccc
END 0x0000fff916774d08

derekbruening commented 2 years ago

I don't follow what the prior comment is trying to say: the pthread_mutex_lock code does not have a branch that skips the ldaxr. It looks like it always executed the ldaxr whenever it executes stxr. So it should not matter whether the stxr block is separate or not.

kuhanov commented 2 years ago

+0 L3 @0x0000ffff9a4abcb4 52800020 movz $0x0001 lsl $0x00 -> %w0 +4 L3 @0x0000ffff9a4abcb8 885ffe61 ldaxr (%x19)[4byte] -> %w1 +8 L3 @0x0000ffff9a4abcbc 7100003f subs %w1 $0x0000 lsl $0x00 -> %wzr +12 L3 @0x0000ffff9a4abcc0 54000061 b.ne $0x0000ffff9a4abccc

The problem is that I don't see the bb with ldaxr before stxr something like this is absent in logs.

+0    L3 @0x0000ffff9a4abcb4  52800020   movz   $0x0001 lsl $0x00 -> %w0
 +4    L3 @0x0000ffff9a4abcb8  885ffe61   ldaxr  (%x19)[4byte] -> %w1
 +8    L3 @0x0000ffff9a4abcbc  7100003f   subs   %w1 $0x0000 lsl $0x00 -> %wzr
 +12   L3 @0x0000ffff9a4abcc0  54000061   b.ne   $0x0000ffff9a4abccc

there is no mangling ldaxr (%x19)[4byte] -> %w1

derekbruening commented 2 years ago

The problem is that I don't see the bb with ldaxr before stxr something like this is absent in logs.

Are you sure it's not in some other thread's log or something. If you could find the branch that skips the ldaxr -- record dynamic instruction trace or something. Or run without DR and put a breakpoint on both the ldaxr and the stxr and see whether the stxr is ever reached w/o the ldaxr -- unfortunately there is no LBR access but that would be a confirmation.

kuhanov commented 2 years ago

The problem is that I don't see the bb with ldaxr before stxr something like this is absent in logs.

Are you sure it's not in some other thread's log or something. If you could find the branch that skips the ldaxr -- record dynamic instruction trace or something. Or run without DR and put a breakpoint on both the ldaxr and the stxr and see whether the stxr is ever reached w/o the ldaxr -- unfortunately there is no LBR access but that would be a confirmation.

try to reproduce with debug logs to understand the fragment chain.

kuhanov commented 2 years ago

Still could not reproduce hang in debug. Just have log where we've got cut fragment with just store without load

no one fragment was linked with F204780(0x0000ffff9eba6e40)

d_r_dispatch: target = 0x0000ffff9eba8f58
Entry into F59790(0x0000ffff9eba8f58).0x0000ffff1b2cf740 (shared)
fcache_enter = 0x0000ffff1abf50c0, target = 0x0000ffff1b2cf73c
Exit from F1801(0x0000ffff9eba66a0).0x0000ffff1ac4875c (shared) (cannot link F1801->F108706) (cannot link shared to private)

d_r_dispatch: target = 0x0000ffff9eba6718
Entry into F108706(0x0000ffff9eba6718).0x0000ffff1b5a3104 
fcache_enter = 0x0000ffff1abf50c0, target = 0x0000ffff1b5a3100
Exit from F108706(0x0000ffff9eba6718).0x0000ffff1b5a3130  (cannot link F108706->F26404) (cannot link shared to private)

d_r_dispatch: target = 0x0000ffff9eba673c
Entry into F26404(0x0000ffff9eba673c).0x0000ffff1af018b8 (shared)
fcache_enter = 0x0000ffff1abf50c0, target = 0x0000ffff1af018b4
Exit from F108860(0x0000ffff9eba8f38).0x0000ffff1b5a3334  (cannot link F108860->F59790) (cannot link shared to private)

d_r_dispatch: target = 0x0000ffff9eba8f58
Entry into F59790(0x0000ffff9eba8f58).0x0000ffff1b2cf740 (shared)
fcache_enter = 0x0000ffff1abf50c0, target = 0x0000ffff1b2cf73c

master_signal_handler: thread=47481, sig=12, xsp=0x0000fff91e6c9da0, retaddr=0x000000000000000c
siginfo: sig = 12, pid = 45807, status = 0, errno = 0, si_code = -6

building bb instrlist now *********************

interp: start_pc = 0x0000ffff9eba6e38
check_thread_vm_area: pc = 0x0000ffff9eba6e38
check_thread_vm_area: check_stop = 0x0000ffff9ebcf158
  0x0000ffff9eba6e38  52800041   movz   $0x0002 lsl $0x00 -> %w1
  0x0000ffff9eba6e3c  885ffe60   ldaxr  (%x19)[4byte] -> %w0
  0x0000ffff9eba6e40  88027e61   stxr   %w1 -> (%x19)[4byte] %w2
  0x0000ffff9eba6e44  35ffffc2   cbnz   $0x0000ffff9eba6e3c %w2
end_pc = 0x0000ffff9eba6e48

setting cur_pc (for fall-through) to 0x0000ffff9eba6e48
forward_eflags_analysis: movz   $0x0002 lsl $0x00 -> %w1
    instr 0 => 0
forward_eflags_analysis: ldaxr  (%x19)[4byte] -> %w0
    instr 0 => 0
forward_eflags_analysis: stxr   %w1 -> (%x19)[4byte] %w2
    instr 0 => 0
Converting exclusive load @0x0000ffff9eba6e3c to regular
Using optimized same-block ldex-stex mangling
Converting exclusive store @0x0000ffff9eba6e40 to compare-and-swap
bb ilist after mangling:
TAG  0x0000ffff9eba6e38
 +0    L3 @0x0000ffff9eba6e38  52800041   movz   $0x0002 lsl $0x00 -> %w1
 +4    L4 @0x0000ffff9eba6e3c  88dffe60   ldar   (%x19)[4byte] -> %w0
 +8    m4 @0x0000ffff9eba6e3c  88dffe60   <label>
 +8    m4 @0x0000ffff9eba6e40  885ffe62   ldaxr  (%x19)[4byte] -> %w2
 +12   m4 @0x0000ffff9eba6e40  cb206042   sub    %x2 %x0 uxtx $0x0000000000000000 -> %x2
 +16   m4 @0x0000ffff9eba6e40  b5000002   cbnz   @0x0000fff91e6ab9e8[8byte] %x2
 +20   L3 @0x0000ffff9eba6e40  88027e61   stxr   %w1 -> (%x19)[4byte] %w2
 +24   m4 @0x0000ffff9eba6e40  14000000   b      @0x0000fff91e6abc80[8byte]
 +28   m4 @0x0000ffff9eba6e40  14000000   <label>
 +28   m4 @0x0000ffff9eba6e40  d5033f5f   clrex  $0x000000000000000f
 +32   m3 @0x0000ffff9eba6e40  88027e61   stxr   %w1 -> (%x19)[4byte] %w2
 +36   m4 @0x0000ffff9eba6e40  d5033f5f   <label>
 +36   L3 @0x0000ffff9eba6e44  35ffffc2   cbnz   $0x0000ffff9eba6e3c %w2
 +40   L4 @0x0000ffff9eba6e44  14000000   b      $0x0000ffff9eba6e48
END 0x0000ffff9eba6e38

done building bb instrlist *********************

building bb instrlist now *********************

interp: start_pc = 0x0000ffff9eba6e38
check_thread_vm_area: pc = 0x0000ffff9eba6e38
check_thread_vm_area: check_stop = 0x0000ffff9ebcf158
  0x0000ffff9eba6e38  52800041   movz   $0x0002 lsl $0x00 -> %w1
  0x0000ffff9eba6e3c  885ffe60   ldaxr  (%x19)[4byte] -> %w0
  0x0000ffff9eba6e40  88027e61   stxr   %w1 -> (%x19)[4byte] %w2
  0x0000ffff9eba6e44  35ffffc2   cbnz   $0x0000ffff9eba6e3c %w2
end_pc = 0x0000ffff9eba6e48

setting cur_pc (for fall-through) to 0x0000ffff9eba6e48
forward_eflags_analysis: movz   $0x0002 lsl $0x00 -> %w1
    instr 0 => 0
forward_eflags_analysis: ldaxr  (%x19)[4byte] -> %w0
    instr 0 => 0
forward_eflags_analysis: stxr   %w1 -> (%x19)[4byte] %w2
    instr 0 => 0
Converting exclusive load @0x0000ffff9eba6e3c to regular
Using optimized same-block ldex-stex mangling
Converting exclusive store @0x0000ffff9eba6e40 to compare-and-swap
bb ilist after mangling:
TAG  0x0000ffff9eba6e38
 +0    L3 @0x0000ffff9eba6e38  52800041   movz   $0x0002 lsl $0x00 -> %w1
 +4    L4 @0x0000ffff9eba6e3c  88dffe60   ldar   (%x19)[4byte] -> %w0
 +8    m4 @0x0000ffff9eba6e3c  88dffe60   <label>
 +8    m4 @0x0000ffff9eba6e40  885ffe62   ldaxr  (%x19)[4byte] -> %w2
 +12   m4 @0x0000ffff9eba6e40  cb206042   sub    %x2 %x0 uxtx $0x0000000000000000 -> %x2
 +16   m4 @0x0000ffff9eba6e40  b5000002   cbnz   @0x0000fff91e6ab418[8byte] %x2
 +20   L3 @0x0000ffff9eba6e40  88027e61   stxr   %w1 -> (%x19)[4byte] %w2
 +24   m4 @0x0000ffff9eba6e40  14000000   b      @0x0000fff91e6aaca0[8byte]
 +28   m4 @0x0000ffff9eba6e40  14000000   <label>
 +28   m4 @0x0000ffff9eba6e40  d5033f5f   clrex  $0x000000000000000f
 +32   m3 @0x0000ffff9eba6e40  88027e61   stxr   %w1 -> (%x19)[4byte] %w2
 +36   m4 @0x0000ffff9eba6e40  d5033f5f   <label>
 +36   L3 @0x0000ffff9eba6e44  35ffffc2   cbnz   $0x0000ffff9eba6e3c %w2
 +40   L4 @0x0000ffff9eba6e44  14000000   b      $0x0000ffff9eba6e48
END 0x0000ffff9eba6e38

done building bb instrlist *********************

Exit due to proactive reset

d_r_dispatch: target = 0x0000ffff9eba6e40

build_basic_block_fragment !!!!!!!!!!!!!!!!!!

interp: start_pc = 0x0000ffff9eba6e40
check_thread_vm_area: pc = 0x0000ffff9eba6e40
check_thread_vm_area: check_stop = 0x0000ffff9ebcf158
  0x0000ffff9eba6e40  88027e61   stxr   %w1 -> (%x19)[4byte] %w2
  0x0000ffff9eba6e44  35ffffc2   cbnz   $0x0000ffff9eba6e3c %w2
end_pc = 0x0000ffff9eba6e48

Converting exclusive store @0x0000ffff9eba6e40 to compare-and-swap
bb ilist after mangling:
TAG  0x0000ffff9eba6e40
 +0    m4 @0x0000000000000000  f9000380   str    %x0 -> (%x28)[8byte]
 +4    m4 @0x0000000000000000  f9405780   ldr    +0xa8(%x28)[8byte] -> %x0
 +8    m4 @0x0000000000000000  cb206262   sub    %x19 %x0 uxtx $0x0000000000000000 -> %x2
 +12   m4 @0x0000000000000000  b5000002   cbnz   @0x0000fff91e6ab920[8byte] %x2
 +16   m4 @0x0000000000000000  f9406380   ldr    +0xc0(%x28)[8byte] -> %x0
 +20   m4 @0x0000000000000000  d1001002   sub    %x0 $0x0000000000000004 lsl $0x0000000000000000 -> %x2
 +24   m4 @0x0000000000000000  b5000002   cbnz   @0x0000fff91e6ab920[8byte] %x2
 +28   m4 @0x0000000000000000  f9405b80   ldr    +0xb0(%x28)[8byte] -> %x0
 +32   m4 @0x0000000000000000  885ffe62   ldaxr  (%x19)[4byte] -> %w2
 +36   m4 @0x0000000000000000  cb206042   sub    %x2 %x0 uxtx $0x0000000000000000 -> %x2
 +40   m4 @0x0000000000000000  b5000002   cbnz   @0x0000fff91e6ab920[8byte] %x2
 +44   L3 @0x0000ffff9eba6e40  88027e61   stxr   %w1 -> (%x19)[4byte] %w2
 +48   m4 @0x0000000000000000  14000000   b      @0x0000fff91e6ab7a0[8byte]
 +52   m4 @0x0000000000000000  14000000   <label>
 +52   m4 @0x0000000000000000  d5033f5f   clrex  $0x000000000000000f
 +56   m3 @0x0000ffff9eba6e40  88027e61   stxr   %w1 -> (%x19)[4byte] %w2
 +60   m4 @0x0000000000000000  d5033f5f   <label>
 +60   m4 @0x0000000000000000  f9400380   ldr    (%x28)[8byte] -> %x0
 +64   L3 @0x0000ffff9eba6e44  35ffffc2   cbnz   $0x0000ffff9eba6e3c %w2
 +68   L4 @0x0000000000000000  14000000   b      $0x0000ffff9eba6e48
END 0x0000ffff9eba6e40

linking new fragment F204780(0x0000ffff9eba6e40)
  linking incoming links for F204780(0x0000ffff9eba6e40)
  linking outgoing links for F204780(0x0000ffff9eba6e40)
    linking F204780(0x0000ffff9eba6e40).0x0000ffff1bcac8d4 -> F136578(0x0000ffff9eba6e3c)=0x0000ffff1b755c34
    add incoming F204780(0x0000ffff9eba6e40).0x0000ffff1bcac8d4 -> F136578(0x0000ffff9eba6e3c)
    linking F204780(0x0000ffff9eba6e40).0x0000ffff1bcac8d8 -> F26394(0x0000ffff9eba6e48)=0x0000ffff1af014fc
    add incoming F204780(0x0000ffff9eba6e40).0x0000ffff1bcac8d8 -> F26394(0x0000ffff9eba6e48)

Entry into F204780(0x0000ffff9eba6e40).0x0000ffff1bcac894 (shared)
fcache_enter = 0x0000ffff1abf50c0, target = 0x0000ffff1bcac890
Exit from F1801(0x0000ffff9eba66a0).0x0000ffff1ac4875c (shared) (cannot link F1801->F108706) (cannot link shared to private)

kuhanov commented 2 years ago

run the same workload under gdb with breakpoints in pthrea_mutex_lock monitor region

try to catch 1000001 times. all stores had loads.

Num     Type           Disp Enb Address            What
2       breakpoint     keep y   0x0000ffffbf67df94 <pthread_mutex_lock+52>
        breakpoint already hit 1000001 times
3       breakpoint     keep y   0x0000ffffbf67dfa0 <pthread_mutex_lock+64>
        breakpoint already hit 1000001 times

derekbruening commented 2 years ago

Not reproducing in debug reminds me of some tests that are hanging in release on AArch64 but never hang in debug: #4928, e.g. We were going to try to figure that out soon; maybe we can get lucky and it will be the same underlying problem as here.

derekbruening commented 2 years ago

For your logs in https://github.com/DynamoRIO/dynamorio/issues/3733#issuecomment-1041661068 the explanation is this line:

Exit due to proactive reset

So DR suspended a thread in between the ldaxr and stxr and redirected it to start executing at a new block that tail-duplicates the original. So dynamically there was a ldaxr before the stxr; DR just made a split block for the suspend-and-relocate.

kuhanov commented 2 years ago

Not reproducing in debug reminds me of some tests that are hanging in release on AArch64 but never hang in debug: #4928, e.g. We were going to try to figure that out soon; maybe we can get lucky and it will be the same underlying problem as here.

Hi, @derekbruening. this patch https://github.com/DynamoRIO/dynamorio/pull/5367/commits/3c846dafa2cf5a7a0d7cf4c7fa88a6e097c7016e?

derekbruening commented 2 years ago

Not reproducing in debug reminds me of some tests that are hanging in release on AArch64 but never hang in debug: #4928, e.g. We were going to try to figure that out soon; maybe we can get lucky and it will be the same underlying problem as here.

Hi, @derekbruening. this patch 3c846da?

Yes PR #5367 fixes one hang we found that reproduced in release build but not debug (just b/c of timing). There are more though: drcachesim online (#4928) and there are some code-inspection issues #2502. Still, it is worth trying with the PR #5367 patch that was just merged to see if that helps these Java apps.

kuhanov commented 2 years ago

Remove all my workarounds (prohibition splitting inside monitor region nad so on) and apply this patch. Had one hang on 2000 runs. Previous was about 2-3 hangs on 100 time run. Kirill

derekbruening commented 2 years ago

Sounds like progress. There's also PR #5370 and PR #5375.

kuhanov commented 2 years ago

Sounds like progress. There's also PR #5370 and PR #5375.

These patches didn't help better, the same hang frequency

kuhanov commented 2 years ago

So DR suspended a thread in between the ldaxr and stxr and redirected it to start executing at a new block that tail-duplicates the original. So dynamically there was a ldaxr before the stxr; DR just made a split block for the suspend-and-relocate.

Hi. Still haunts the question here. If it was so and we just suspended thread between load and store, we should have the same counters in rstats statistics but thay are different in many runs

 Load-exclusive instrs converted to CAS :              56721
 Store-exclusive instrs converted to CAS :             56686

Kirill

kuhanov commented 2 years ago

Hi @derekbruening We have SIGSEGV crash case on AArch64 again

java report

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000ffff93f12c4c, pid=630238, tid=0x0000fff1010621e0
#
# JRE version: OpenJDK Runtime Environment (8.0) (build 1.8.0-internal)
# Java VM: OpenJDK 64-Bit Server VM (25.71-b00 mixed mode linux-aarch64 )
# Problematic frame:
# V  [libjvm.so+0x562c4c]  PhaseChaitin::build_ifg_physical(ResourceArea*)+0x42c
#
# Failed to write core dump..
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

crash context

Registers:
R0=0x0000fff08c1b10b0
R1=0x0000fff08c1a2f40
R2=0xffffffff945d8290
R3=0x0000000000000007
R4=0x0000000000000002
R5=0x0000000000000000
R6=0x0000ffff946add20
R7=0x0000000000000000
R8=0x0000fff1010628e0
R9=0x0000000000000000
R10=0x00000000ffffffff
R11=0x0000000000000000
R12=0x0000000000000000
R13=0x0000000000000000
R14=0x0000000000000000
R15=0x0000000000000000
R16=0x0000000000000000
R17=0x0000000000000000
R18=0x0000000000000000
R19=0x0000fff08c1b0f90
R20=0x00000000000f423f
R21=0x0000fff08c03efb0
R22=0x0000fff10105efb0
R23=0x0000000000000018
R24=0x0000000000000068
R25=0x0000000000000000
R26=0x0000fff10105efb0
R27=0x0000ffff946add20
R28=0x0000000000000001
R29=0x0000fff10105e9d0
R30=0x0000ffff93f12b8c

The same SIGSEGV in DynamoRIO logs

computing memory target for 0x0000ffff115a5e8c causing SIGSEGV, kernel claims it is 0x0000ffeedd903980
compute_memory_target: falling back to racy protection checks
opnd_compute_address for: (%x1,%x2,lsl #2)
        base => 0x0000fff08c1a2f40
        index,scale => 0x0000ffeedd903980
        disp => 0x0000ffeedd903980
For SIGSEGV at cache pc 0x0000ffff115a5e8c, computed target read 0x0000ffeedd903980
        faulting instr: ldr    (%x1,%x2,lsl #2)[4byte] -> %w25
** Received SIGSEGV at cache pc 0x0000ffff115a5e8c in thread 630480

$10 = {uc_flags = 0x0, uc_link = 0x0, uc_stack = {ss_sp = 0x0, ss_flags = 0x2, ss_size = 0x0}, uc_sigmask = {__val = {0x4, 0xabababababababab <repeats 15 times>}},
  uc_mcontext = {fault_address = 0xffeedd903980, regs = {0xfff08c1b10b0, 0xfff08c1a2f40, 0xffffffff945d8290, 0x7, 0x2, 0x0, 0xffff946add20, 0x0, 0xfff1010628e0, 0x0,
      0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xfff08c1b0f90, 0xf423f, 0xfff08c03efb0, 0xfff10105efb0, 0x18, 0x68, 0x0, 0xfff10105efb0, 0xffff946add20, 0x1,
      0xfff10105e9d0, 0xffff93f12b8c}, sp = 0xfff10105e9d0, pc = 0xffff93f12c4c, pstate = 0x80000000, __reserved = {0x1, 0x80, 0x50, 0x46, 0x10, 0x2, 0x0, 0x0, 0x10, 0x0,
      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x40, 0x6e, 0xe9, 0xea, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x68, 0x44, 0x55, 0x1c, 0xe6, 0x93, 0x12, 0x40,
      0x0 <repeats 31 times>, 0xc, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x88, 0xf2, 0x1a, 0x8c, 0xf0, 0xff, 0x0, 0x0, 0x80, 0x4d, 0x1, 0x8c, 0xf0, 0xff, 0x0, 0x0, 0x10,
      0x4d, 0x1, 0x8c, 0xf0, 0xff, 0x0, 0x0, 0xa8, 0x4d, 0x29, 0x8c, 0xf0, 0xff, 0x0, 0x0, 0xb8, 0x39, 0x5, 0x80, 0xf0, 0xff, 0x0, 0x0, 0x38, 0x37, 0x5, 0x80, 0xf0, 0xff,
      0x0, 0x0, 0x2, 0x8, 0x20, 0x80, 0x2, 0x8, 0x20, 0x80, 0x2, 0x8, 0x20, 0x80, 0x2, 0x8, 0x20, 0x80, 0x0, 0x0, 0x0, 0x40, 0x6e, 0xe9, 0xea, 0x3f,
      0x0 <repeats 120 times>, 0x1, 0x4, 0x10, 0x40, 0x1, 0x4, 0x10, 0x40, 0x1, 0x4, 0x10, 0x40, 0x1, 0x4, 0x10, 0x40, 0x10, 0x0, 0xaa, 0xaa, 0x41, 0x0, 0x0, 0x10, 0x1,
      0x40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0x1, 0x0, 0x0, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0x0, 0x0, 0x3, 0x0 <repeats 14 times>...}}}

Basic Block and mangling bb

d_r_dispatch: target = 0x0000ffff93f12c3c

interp: start_pc = 0x0000ffff93f12c3c
check_thread_vm_area: pc = 0x0000ffff93f12c3c
check_thread_vm_area: check_stop = 0x0000ffff946d6408
  0x0000ffff93f12c3c  f9400660   ldr    +0x08(%x19)[8byte] -> %x0
  0x0000ffff93f12c40  f94096c1   ldr    +0x0128(%x22)[8byte] -> %x1
  0x0000ffff93f12c44  f87c5800   ldr    (%x0,%w28,uxtw #3)[8byte] -> %x0
  0x0000ffff93f12c48  b9802802   ldrsw  +0x28(%x0)[4byte] -> %x2      
  0x0000ffff93f12c4c  b8627839   ldr    (%x1,%x2,lsl #2)[4byte] -> %w25 <<<============ CRASH
  0x0000ffff93f12c50  34ffff19   cbz    $0x0000ffff93f12c30 %w25
end_pc = 0x0000ffff93f12c54

skip save stolen reg app value for: ldr    (%x0,%w28,uxtw #3)[8byte] -> %x0
bb ilist after mangling:
TAG  0x0000ffff93f12c3c
 +0    L3 @0x0000fff9116d9538  f9400660   ldr    +0x08(%x19)[8byte] -> %x0
 +4    L3 @0x0000fff91150c4d8  f94096c1   ldr    +0x0128(%x22)[8byte] -> %x1
 +8    m4 @0x0000fff9116d9128  f9000781   str    %x1 -> +0x08(%x28)[8byte]
 +12   m4 @0x0000fff9116d9438  aa1c03e1   orr    %xzr %x28 lsl $0x0000000000000000 -> %x1
 +16   m4 @0x0000fff91150cdc0  f9401b9c   ldr    +0x30(%x28)[8byte] -> %x28
 +20   L3 @0x0000fff91150c290  f87c5800   ldr    (%x0,%w28,uxtw #3)[8byte] -> %x0
 +24   m4 @0x0000fff9116d6458  aa0103fc   orr    %xzr %x1 lsl $0x0000000000000000 -> %x28
 +28   m4 @0x0000fff9116d6060  f9400781   ldr    +0x08(%x28)[8byte] -> %x1
 +32   L3 @0x0000fff9116d9028  b9802802   ldrsw  +0x28(%x0)[4byte] -> %x2  
 +36   L3 @0x0000fff91150c930  b8627839   ldr    (%x1,%x2,lsl #2)[4byte] -> %w25 <<<============ CRASH
 +40   L3 @0x0000fff91150cf08  34ffff19   cbz    $0x0000ffff93f12c30 %w25
 +44   L4 @0x0000fff9116d91f0  14000000   b      $0x0000ffff93f12c54
END 0x0000ffff93f12c3c

Look at crashes instruction +36 L3 @0x0000fff91150c930 b8627839 ldr (%x1,%x2,lsl #2)[4byte] -> %w25 <<<============ CRASH We got fault_address=0xffeedd903980 if we use register context x2=0xffffffff945d8290; x1=0xfff08c1a2f40

(gdb) p /x  0xffffffff945d8290<<2
$11 = 0xfffffffe51760a40
(gdb) p /x  (0xfff08c1a2f40+0xfffffffe51760a40)
$12 = 0xffeedd903980

BUT let's look at the previous instruction 0x0000ffff93f12c48 b9802802 ldrsw +0x28(%x0)[4byte] -> %x2 x0=0xfff08c1b10b0

(gdb) x /gx (0xfff08c1b10b0+0x28)
0xfff08c1b10d8: 0x0000ffff945d8290

So, ldrsw instruction must set x2=0x945d8290. it should not be x2=0xffffffff945d8290 the CRASH instruction is ok if x2=0x945d8290

(gdb) p /x  0x945d8290<<2
$16 = 0x51760a40
(gdb) p /x  (0xfff08c1a2f40+0x51760a40)
$17 = 0xfff0dd903980
(gdb) x /gx (0xfff08c1a2f40+0x51760a40)
0xfff0dd903980: 0x0000000000000000

Does DRIO make some internal job here? What could be wrong? I could not catch why x2 register is incorrect. Thanks, Kirill

kuhanov commented 2 years ago

So, ldrsw instruction must set x2=0x945d8290. it should not be x2=0xffffffff945d8290

oh, ldrsw is signed, so x2 could be 0xffffffff945d8290 continue investigation what could be wrong here.

kuhanov commented 2 years ago

Hi @derekbruening. Could you help me to understand the crash? We've got synchro signal on thread.

main_signal_handler: thread=1588266, sig=12, xsp=0x0000fff923c94da0, retaddr=0x000000000000000c
siginfo: sig = 12, pid = 1587929, status = 0, errno = 0, si_code = -6
        x0     = 0x0000000000000000
        x1     = 0x0000fff923c6e000
        x2     = 0x000000000000000c
        x3     = 0x0000000000000030
        x4     = 0x000000000000005c
        x5     = 0x0000000000003c05
        x6     = 0x0000fff09015a9b8
        x7     = 0xfefeff6f6071735e
        x8     = 0x7f7f7f7f7f7f7f7f
        x9     = 0x0000000000000000
        x10    = 0x0101010101010101
        x11    = 0x0000000000000028
        x12    = 0x0000a701409d1276
        x13    = 0x0000000000000040
        x14    = 0x000000000000003f
        x15    = 0x0000000000000000
        x16    = 0x0000ffffa651dc00
        x17    = 0x0000ffffa6bc4080
        x18    = 0x0000000000000000
        x19    = 0x0000000000000030
        x20    = 0x0000fff090484208
        x21    = 0x0000fff0901b94f8
        x22    = 0x0000ffffa6600340
        x23    = 0x0000000000000001
        x24    = 0x0000000000000021
        x25    = 0x0000fff09045a8f8
        x26    = 0x0000000000000021
        x27    = 0x0000000000000108
        x28    = 0x0000fff923c6e000
        x29    = 0x0000fff106572880
        x30    = 0x0000ffffa635068c
        sp     = 0x0000fff106572880
        pc     = 0x0000ffff238c68c8
        pstate = 0x0000000020000000

pc is 0x0000ffff238c68c8

code cache for the bb looks like

(gdb) x /16i (0x0000ffff238c68c8-48)
   0xffff238c6898:      ldr     x0, [x25, #8]
   0xffff238c689c:      str     x0, [x28]
   0xffff238c68a0:      mov     x0, x28
   0xffff238c68a4:      ldr     x28, [x28, #48]
   0xffff238c68a8:      lsl     x27, x28, #3
   0xffff238c68ac:      mov     x28, x0
   0xffff238c68b0:      ldr     x0, [x28]
   0xffff238c68b4:      str     x1, [x28, #8]
   0xffff238c68b8:      mov     x1, x28
   0xffff238c68bc:      ldr     x28, [x28, #48]
   0xffff238c68c0:      ldr     x0, [x0, x28, lsl #3]
   0xffff238c68c4:      mov     x28, x1
==>   0xffff238c68c8:      ldr     x1, [x28, #8]   <==
   0xffff238c68cc:      cmp     x20, x0
   0xffff238c68d0:      b.eq    0xffff238c6de8  // b.none
   0xffff238c68d4:      b       0xffff238c6a68

clear bb and bb after mangling

interp: start_pc = 0x0000ffffa6350424
check_thread_vm_area: pc = 0x0000ffffa6350424
check_thread_vm_area: check_stop = 0x0000ffffa6b02888
  0x0000ffffa6350424  f9400720   ldr    +0x08(%x25)[8byte] -> %x0
  0x0000ffffa6350428  d37df39b   ubfm   %x28 $0x3d $0x3c -> %x27
  0x0000ffffa635042c  f87c7800   ldr    (%x0,%x28,lsl #3)[8byte] -> %x0
  0x0000ffffa6350430  eb00029f   subs   %x20 %x0 lsl $0x00 -> %xzr
  0x0000ffffa6350434  54000340   b.eq   $0x0000ffffa635049c
end_pc = 0x0000ffffa6350438

bb ilist after mangling:
TAG  0x0000ffffa6350424
 +0    L3 @0x0000fff923eafda0  f9400720   ldr    +0x08(%x25)[8byte] -> %x0
 +4    m4 @0x0000fff923eb1110  f9000380   str    %x0 -> (%x28)[8byte]
 +8    m4 @0x0000fff923eb1df0  aa1c03e0   orr    %xzr %x28 lsl $0x0000000000000000 -> %x0
 +12   m4 @0x0000fff923eb1358  f9401b9c   ldr    +0x30(%x28)[8byte] -> %x28
 +16   L3 @0x0000fff923eb0430  d37df39b   ubfm   %x28 $0x3d $0x3c -> %x27
 +20   m4 @0x0000fff923eb1090  aa0003fc   orr    %xzr %x0 lsl $0x0000000000000000 -> %x28
 +24   m4 @0x0000fff923eae950  f9400380   ldr    (%x28)[8byte] -> %x0
 +28   m4 @0x0000fff923eaf438  f9000781   str    %x1 -> +0x08(%x28)[8byte]
 +32   m4 @0x0000fff923eae9d0  aa1c03e1   orr    %xzr %x28 lsl $0x0000000000000000 -> %x1
 +36   m4 @0x0000fff923eaeb18  f9401b9c   ldr    +0x30(%x28)[8byte] -> %x28
 +40   L3 @0x0000fff923eaf0a8  f87c7800   ldr    (%x0,%x28,lsl #3)[8byte] -> %x0
 +44   m4 @0x0000fff923eafd20  aa0103fc   orr    %xzr %x1 lsl $0x0000000000000000 -> %x28
==> +48   m4 @0x0000fff923eb00e8  f9400781   ldr    +0x08(%x28)[8byte] -> %x1     <==
 +52   L3 @0x0000fff923eb12d8  eb00029f   subs   %x20 %x0 lsl $0x00 -> %xzr
 +56   L3 @0x0000fff923eb20b8  54000340   b.eq   $0x0000ffffa635049c
 +60   L4 @0x0000fff923eb1e70  14000000   b      $0x0000ffffa6350438
END 0x0000ffffa6350424

So, pc 0x0000ffff238c68c8 is mangling m4 instruction ldr +0x08(%x28)[8byte] -> %x1

When the thread was awake, dispatcher set target 0x0000ffffa635042c

handle_suspend_signal: awake now
        main_signal_handler 12 returning now to 0x0000ffff22d11454
Exit due to proactive reset

d_r_dispatch: target = 0x0000ffffa635042c

Building new bb

interp: start_pc = 0x0000ffffa635042c
check_thread_vm_area: pc = 0x0000ffffa635042c
check_thread_vm_area: check_stop = 0x0000ffffa6b02888
==>  0x0000ffffa635042c  f87c7800   ldr    (%x0,%x28,lsl #3)[8byte] -> %x0 <==
  0x0000ffffa6350430  eb00029f   subs   %x20 %x0 lsl $0x00 -> %xzr
  0x0000ffffa6350434  54000340   b.eq   $0x0000ffffa635049c
end_pc = 0x0000ffffa6350438

bb ilist after mangling:
TAG  0x0000ffffa635042c
 +0    m4 @0x0000fff923eb1110  f9000781   str    %x1 -> +0x08(%x28)[8byte]
 +4    m4 @0x0000fff923eb12d8  aa1c03e1   orr    %xzr %x28 lsl $0x0000000000000000 -> %x1
 +8    m4 @0x0000fff923eb1df0  f9401b9c   ldr    +0x30(%x28)[8byte] -> %x28
==> +12   L3 @0x0000fff923eaf0a8  f87c7800   ldr    (%x0,%x28,lsl #3)[8byte] -> %x0 <==
 +16   m4 @0x0000fff923eb1358  aa0103fc   orr    %xzr %x1 lsl $0x0000000000000000 -> %x28
*** +20   m4 @0x0000fff923eb0430  f9400781   ldr    +0x08(%x28)[8byte] -> %x1 ***
 +24   L3 @0x0000fff923eafd20  eb00029f   subs   %x20 %x0 lsl $0x00 -> %xzr
 +28   L3 @0x0000fff923eb00e8  54000340   b.eq   $0x0000ffffa635049c
 +32   L4 @0x0000fff923eafda0  14000000   b      $0x0000ffffa6350438
END 0x0000ffffa635042c

Looks like we back to 1st original instruction ldr (%x0,%x28,lsl #3)[8byte] -> %x0 before our mangle ldr +0x08(%x28)[8byte] -> %x1 but probably register context was not restored and x0 register has incorrect value

crash signal context

main_signal_handler: thread=1588266, sig=11, xsp=0x0000fff923c94da0, retaddr=0x000000000000000b
siginfo: sig = 11, pid = 264, status = 0, errno = 0, si_code = 1
          x0     = 0x0000000000000000
          x1     = 0x0000fff923c6e000
          x2     = 0x000000000000000c
          x3     = 0x0000000000000030
          x4     = 0x000000000000005c
          x5     = 0x0000000000003c05
          x6     = 0x0000fff09015a9b8
          x7     = 0xfefeff6f6071735e
          x8     = 0x7f7f7f7f7f7f7f7f
          x9     = 0x0000000000000000
          x10    = 0x0101010101010101
          x11    = 0x0000000000000028
          x12    = 0x0000a701409d1276
          x13    = 0x0000000000000040
          x14    = 0x000000000000003f
          x15    = 0x0000000000000000
          x16    = 0x0000ffffa651dc00
          x17    = 0x0000ffffa6bc4080
          x18    = 0x0000000000000000
          x19    = 0x0000000000000030
          x20    = 0x0000fff090484208
          x21    = 0x0000fff0901b94f8
          x22    = 0x0000ffffa6600340
          x23    = 0x0000000000000001
          x24    = 0x0000000000000021
          x25    = 0x0000fff09045a8f8
          x26    = 0x0000000000000021
          x27    = 0x0000000000000108
          x28    = 0x0000000000000021
          x29    = 0x0000fff106572880
          x30    = 0x0000ffffa635068c
          sp     = 0x0000fff106572880
          pc     = 0x0000ffff2417046c
          pstate = 0x0000000020000000

computing memory target for 0x0000ffff2417046c causing SIGSEGV, kernel claims it is 0x0000000000000108
compute_memory_target: falling back to racy protection checks
opnd_compute_address for: (%x0,%x28,lsl #3)
          base => 0x0000000000000000
          index,scale => 0x0000000000000108
          disp => 0x0000000000000108
For SIGSEGV at cache pc 0x0000ffff2417046c, computed target read 0x0000000000000108
          faulting instr: ldr    (%x0,%x28,lsl #3)[8byte] -> %x0
** Received SIGSEGV at cache pc 0x0000ffff2417046c in thread 1588266
record_pending_signal(11) from cache pc 0x0000ffff2417046c
          not certain can delay so handling now
          action is not SIG_IGN

(gdb) x /9i (0x0000ffff2417046c-12)
   0xffff24170460:      str     x1, [x28, #8]
   0xffff24170464:      mov     x1, x28
   0xffff24170468:      ldr     x28, [x28, #48]
   0xffff2417046c:      ldr     x0, [x0, x28, lsl #3]
   0xffff24170470:      mov     x28, x1
   0xffff24170474:      ldr     x1, [x28, #8]
   0xffff24170478:      cmp     x20, x0
   0xffff2417047c:      b.eq    0xffff24170484  // b.none
   0xffff24170480:      b       0xffff238c6a68

Do I understand correctly the following:

before synchro signal we executed ldr x0, [x0, x28, lsl #3] and change x0
after the signal, we back from ldr x1, [x28, #8] to ldr x0, [x0, x28, lsl #3] but don't restore context
we execute ldr x0, [x0, x28, lsl #3] the 2nd time but with incorrect register context

is it possible that we could not restore context? or am I wrong here? Thanks, Kirill

derekbruening commented 2 years ago

You would expect this to be marked as a mangling epilogue. Translation in a mangling epilogue is supposed to target the next instruction and "emulate" the rest of the epilogue, as it is sometimes impossible to undo the app instr and thus returning the being-mangled instr PC for restart is not feasible. This makes it look like that is not done correctly for stolen register mangling on AArch64. I would suggest filing a separate issue to focus on this.

kuhanov commented 2 years ago

I would suggest filing a separate issue to focus on this. ok - #5426

kuhanov commented 2 years ago

It will be great to have some workaround here. this crash is reproduced too often on the pool of our jvm workloads. (( Kirill

kuhanov commented 2 years ago

Hi, @derekbruening. One more question here. Before handle_suspend_signal: suspended now and handle_suspend_signal: awake now DynamoRIO calls recreate_bb_ilist procedure 2 times and recreate original signaled basic block. Why does it build it? Looks like it doesn't use it after signal because it recreates cutted bb from the last original app instruction Thanks, Kirill

log example

main_signal_handler: thread=1588266, sig=12, xsp=0x0000fff923c94da0, retaddr=0x000000000000000c
siginfo: sig = 12, pid = 1587929, status = 0, errno = 0, si_code = -6
        x0     = 0x0000000000000000
        x1     = 0x0000fff923c6e000
        x2     = 0x000000000000000c
        x3     = 0x0000000000000030
        x4     = 0x000000000000005c
        x5     = 0x0000000000003c05
        x6     = 0x0000fff09015a9b8
        x7     = 0xfefeff6f6071735e
        x8     = 0x7f7f7f7f7f7f7f7f
        x9     = 0x0000000000000000
        x10    = 0x0101010101010101
        x11    = 0x0000000000000028
        x12    = 0x0000a701409d1276
        x13    = 0x0000000000000040
        x14    = 0x000000000000003f
        x15    = 0x0000000000000000
        x16    = 0x0000ffffa651dc00
        x17    = 0x0000ffffa6bc4080
        x18    = 0x0000000000000000
        x19    = 0x0000000000000030
        x20    = 0x0000fff090484208
        x21    = 0x0000fff0901b94f8
        x22    = 0x0000ffffa6600340
        x23    = 0x0000000000000001
        x24    = 0x0000000000000021
        x25    = 0x0000fff09045a8f8
        x26    = 0x0000000000000021
        x27    = 0x0000000000000108
        x28    = 0x0000fff923c6e000
        x29    = 0x0000fff106572880
        x30    = 0x0000ffffa635068c
        sp     = 0x0000fff106572880
        pc     = 0x0000ffff238c68c8
        pstate = 0x0000000020000000
dcontext next tag = 0x0000ffff240d3d8c
handle_suspend_signal: suspended now

building bb instrlist now *********************

interp: start_pc = 0x0000ffffa6350424
check_thread_vm_area: pc = 0x0000ffffa6350424
check_thread_vm_area: check_stop = 0x0000ffffa6b02888
  0x0000ffffa6350424  f9400720   ldr    +0x08(%x25)[8byte] -> %x0
  0x0000ffffa6350428  d37df39b   ubfm   %x28 $0x3d $0x3c -> %x27
  0x0000ffffa635042c  f87c7800   ldr    (%x0,%x28,lsl #3)[8byte] -> %x0
  0x0000ffffa6350430  eb00029f   subs   %x20 %x0 lsl $0x00 -> %xzr
  0x0000ffffa6350434  54000340   b.eq   $0x0000ffffa635049c
end_pc = 0x0000ffffa6350438

setting cur_pc (for fall-through) to 0x0000ffffa6350438
forward_eflags_analysis: ldr    +0x08(%x25)[8byte] -> %x0
        instr 0 => 0
forward_eflags_analysis: ubfm   %x28 $0x3d $0x3c -> %x27
        instr 0 => 0
forward_eflags_analysis: ldr    (%x0,%x28,lsl #3)[8byte] -> %x0
        instr 0 => 0
forward_eflags_analysis: subs   %x20 %x0 lsl $0x00 -> %xzr
        instr 3c0 => 0
skip save stolen reg app value for: ubfm   %x28 $0x3d $0x3c -> %x27
skip save stolen reg app value for: ldr    (%x0,%x28,lsl #3)[8byte] -> %x0
bb ilist after mangling:
TAG  0x0000ffffa6350424
 +0    L3 @0x0000fff923eafda0  f9400720   ldr    +0x08(%x25)[8byte] -> %x0
 +4    m4 @0x0000fff923eb1110  f9000380   str    %x0 -> (%x28)[8byte]
 +8    m4 @0x0000fff923eb1df0  aa1c03e0   orr    %xzr %x28 lsl $0x0000000000000000 -> %x0
 +12   m4 @0x0000fff923eb1358  f9401b9c   ldr    +0x30(%x28)[8byte] -> %x28
 +16   L3 @0x0000fff923eb0430  d37df39b   ubfm   %x28 $0x3d $0x3c -> %x27
 +20   m4 @0x0000fff923eb1090  aa0003fc   orr    %xzr %x0 lsl $0x0000000000000000 -> %x28
 +24   m4 @0x0000fff923eae950  f9400380   ldr    (%x28)[8byte] -> %x0
 +28   m4 @0x0000fff923eaf438  f9000781   str    %x1 -> +0x08(%x28)[8byte]
 +32   m4 @0x0000fff923eae9d0  aa1c03e1   orr    %xzr %x28 lsl $0x0000000000000000 -> %x1
 +36   m4 @0x0000fff923eaeb18  f9401b9c   ldr    +0x30(%x28)[8byte] -> %x28
 +40   L3 @0x0000fff923eaf0a8  f87c7800   ldr    (%x0,%x28,lsl #3)[8byte] -> %x0
 +44   m4 @0x0000fff923eafd20  aa0103fc   orr    %xzr %x1 lsl $0x0000000000000000 -> %x28
 +48   m4 @0x0000fff923eb00e8  f9400781   ldr    +0x08(%x28)[8byte] -> %x1
 +52   L3 @0x0000fff923eb12d8  eb00029f   subs   %x20 %x0 lsl $0x00 -> %xzr
 +56   L3 @0x0000fff923eb20b8  54000340   b.eq   $0x0000ffffa635049c
 +60   L4 @0x0000fff923eb1e70  14000000   b      $0x0000ffffa6350438
END 0x0000ffffa6350424

done building bb instrlist *********************

building bb instrlist now *********************

interp: start_pc = 0x0000ffffa6350424
check_thread_vm_area: pc = 0x0000ffffa6350424
check_thread_vm_area: check_stop = 0x0000ffffa6b02888
  0x0000ffffa6350424  f9400720   ldr    +0x08(%x25)[8byte] -> %x0
  0x0000ffffa6350428  d37df39b   ubfm   %x28 $0x3d $0x3c -> %x27
  0x0000ffffa635042c  f87c7800   ldr    (%x0,%x28,lsl #3)[8byte] -> %x0
  0x0000ffffa6350430  eb00029f   subs   %x20 %x0 lsl $0x00 -> %xzr
  0x0000ffffa6350434  54000340   b.eq   $0x0000ffffa635049c
end_pc = 0x0000ffffa6350438

setting cur_pc (for fall-through) to 0x0000ffffa6350438
forward_eflags_analysis: ldr    +0x08(%x25)[8byte] -> %x0
        instr 0 => 0
forward_eflags_analysis: ubfm   %x28 $0x3d $0x3c -> %x27
        instr 0 => 0
forward_eflags_analysis: ldr    (%x0,%x28,lsl #3)[8byte] -> %x0
        instr 0 => 0
forward_eflags_analysis: subs   %x20 %x0 lsl $0x00 -> %xzr
        instr 3c0 => 0
skip save stolen reg app value for: ubfm   %x28 $0x3d $0x3c -> %x27
skip save stolen reg app value for: ldr    (%x0,%x28,lsl #3)[8byte] -> %x0
bb ilist after mangling:
TAG  0x0000ffffa6350424
 +0    L3 @0x0000fff923eb1e70  f9400720   ldr    +0x08(%x25)[8byte] -> %x0
 +4    m4 @0x0000fff923eaeb18  f9000380   str    %x0 -> (%x28)[8byte]
 +8    m4 @0x0000fff923eae9d0  aa1c03e0   orr    %xzr %x28 lsl $0x0000000000000000 -> %x0
 +12   m4 @0x0000fff923eaf438  f9401b9c   ldr    +0x30(%x28)[8byte] -> %x28
 +16   L3 @0x0000fff923eb20b8  d37df39b   ubfm   %x28 $0x3d $0x3c -> %x27
 +20   m4 @0x0000fff923eae950  aa0003fc   orr    %xzr %x0 lsl $0x0000000000000000 -> %x28
 +24   m4 @0x0000fff923eb1090  f9400380   ldr    (%x28)[8byte] -> %x0
 +28   m4 @0x0000fff923eb0430  f9000781   str    %x1 -> +0x08(%x28)[8byte]
 +32   m4 @0x0000fff923eb1358  aa1c03e1   orr    %xzr %x28 lsl $0x0000000000000000 -> %x1
 +36   m4 @0x0000fff923eb1df0  f9401b9c   ldr    +0x30(%x28)[8byte] -> %x28
 +40   L3 @0x0000fff923eb12d8  f87c7800   ldr    (%x0,%x28,lsl #3)[8byte] -> %x0
 +44   m4 @0x0000fff923eb1110  aa0103fc   orr    %xzr %x1 lsl $0x0000000000000000 -> %x28
 +48   m4 @0x0000fff923eafda0  f9400781   ldr    +0x08(%x28)[8byte] -> %x1
 +52   L3 @0x0000fff923eb00e8  eb00029f   subs   %x20 %x0 lsl $0x00 -> %xzr
 +56   L3 @0x0000fff923eafd20  54000340   b.eq   $0x0000ffffa635049c
 +60   L4 @0x0000fff923eaf0a8  14000000   b      $0x0000ffffa6350438
END 0x0000ffffa6350424

done building bb instrlist *********************

handle_suspend_signal: awake now
        main_signal_handler 12 returning now to 0x0000ffff22d11454

Exit due to proactive reset

d_r_dispatch: target = 0x0000ffffa635042c

interp: start_pc = 0x0000ffffa635042c
check_thread_vm_area: pc = 0x0000ffffa635042c
check_thread_vm_area: check_stop = 0x0000ffffa6b02888
  0x0000ffffa635042c  f87c7800   ldr    (%x0,%x28,lsl #3)[8byte] -> %x0
  0x0000ffffa6350430  eb00029f   subs   %x20 %x0 lsl $0x00 -> %xzr
  0x0000ffffa6350434  54000340   b.eq   $0x0000ffffa635049c
end_pc = 0x0000ffffa6350438

skip save stolen reg app value for: ldr    (%x0,%x28,lsl #3)[8byte] -> %x0
bb ilist after mangling:
TAG  0x0000ffffa635042c
 +0    m4 @0x0000fff923eb1110  f9000781   str    %x1 -> +0x08(%x28)[8byte]
 +4    m4 @0x0000fff923eb12d8  aa1c03e1   orr    %xzr %x28 lsl $0x0000000000000000 -> %x1
 +8    m4 @0x0000fff923eb1df0  f9401b9c   ldr    +0x30(%x28)[8byte] -> %x28
 +12   L3 @0x0000fff923eaf0a8  f87c7800   ldr    (%x0,%x28,lsl #3)[8byte] -> %x0
 +16   m4 @0x0000fff923eb1358  aa0103fc   orr    %xzr %x1 lsl $0x0000000000000000 -> %x28
 +20   m4 @0x0000fff923eb0430  f9400781   ldr    +0x08(%x28)[8byte] -> %x1
 +24   L3 @0x0000fff923eafd20  eb00029f   subs   %x20 %x0 lsl $0x00 -> %xzr
 +28   L3 @0x0000fff923eb00e8  54000340   b.eq   $0x0000ffffa635049c
 +32   L4 @0x0000fff923eafda0  14000000   b      $0x0000ffffa6350438
END 0x0000ffffa635042c

linking new fragment F559721(0x0000ffffa635042c)
  linking incoming links for F559721(0x0000ffffa635042c)
  linking outgoing links for F559721(0x0000ffffa635042c)
    linking F559721(0x0000ffffa635042c).0x0000ffff2417047c -> F127689(0x0000ffffa635049c)=0x0000ffff238c6de8
    add incoming F559721(0x0000ffffa635042c).0x0000ffff2417047c -> F127689(0x0000ffffa635049c)
    linking F559721(0x0000ffffa635042c).0x0000ffff24170480 -> F127682(0x0000ffffa6350438)=0x0000ffff238c6a68
    add incoming F559721(0x0000ffffa635042c).0x0000ffff24170480 -> F127682(0x0000ffffa6350438)
priv_mcontext_t @0x0000fff92338d880
        r0  = 0x0000000000000000
        r1  = 0x000000000000000b
        r2  = 0x000000000000000c
        r3  = 0x0000000000000030
        r4  = 0x000000000000005c
        r5  = 0x0000000000003c05
        r6  = 0x0000fff09015a9b8
        r7  = 0xfefeff6f6071735e
        r8  = 0x7f7f7f7f7f7f7f7f
        r9  = 0x0000000000000000
        r10 = 0x0101010101010101
        r11 = 0x0000000000000028
        r12 = 0x0000a701409d1276
        r13 = 0x0000000000000040
        r14 = 0x000000000000003f
        r15 = 0x0000000000000000
        r16 = 0x0000ffffa651dc00
        r17 = 0x0000ffffa6bc4080
        r18 = 0x0000000000000000
        r19 = 0x0000000000000030
        r20 = 0x0000fff090484208
        r21 = 0x0000fff0901b94f8
        r22 = 0x0000ffffa6600340
        r23 = 0x0000000000000001
        r24 = 0x0000000000000021
        r25 = 0x0000fff09045a8f8
        r26 = 0x0000000000000021
        r27 = 0x0000000000000108
        r28 = 0x0000000000000021
        r29 = 0x0000fff106572880
        r30 = 0x0000ffffa635068c
        r31 = 0x0000fff106572880
        q0  = 0xabababab abababab abababab abababab
        q1  = 0x901b8f28 0000fff0 901b8f28 0000fff0
        q2  = 0x9045a8f8 0000fff0 9045a8f8 0000fff0
        q3  = 0x00000000 00000000 00000000 00000000
        q4  = 0x00000000 00000000 00000000 00000000
        q5  = 0x94683a38 0000fff0 94684d60 0000fff0
        q6  = 0x00000000 00000000 00000000 00000000
        q7  = 0x40100401 40100401 40100401 40100401
        q8  = 0x00000000 00000000 00000000 00000000
        q9  = 0x00000000 00000000 00000000 00000000
        q10 = 0x00000000 00000000 00000000 00000000
        q11 = 0x00000000 00000000 00000000 00000000
        q12 = 0x00000000 00000000 00000000 00000000
        q13 = 0x00000000 00000000 00000000 00000000
        q14 = 0x00000000 00000000 00000000 00000000
        q15 = 0x00000000 00000000 00000000 00000000
        q16 = 0x01005555 00005040 01005555 00005040
        q17 = 0x10000000 aa800010 00001000 a00a8000
        q18 = 0x00100000 00000000 80000000 80200802
        q19 = 0x00000300 00000000 00000000 00000000
        q20 = 0x11111111 01111111 00000000 00000000
        q21 = 0x00000000 10000000 00000000 00000000
        q22 = 0x00000000 0c000000 00000000 00000000
        q23 = 0x00000000 03000000 00000000 00000000
        q24 = 0x00000000 00c00000 00000000 00000000
        q25 = 0x00000000 00300000 00000000 00000000
        q26 = 0x00000000 000c0000 00000000 00000000
        q27 = 0x0c000000 00000000 00000000 00000000
        q28 = 0x30000000 00000000 00000000 00000000
        q29 = 0x0000000c 00000000 00000000 00000000
        q30 = 0x03000000 00000000 00000000 00000000
        q31 = 0x55555555 00015555 00000000 00000000
        eflags = 0x0000000020000000
        pc     = 0x0000ffff240d3d8c
Entry into F559721(0x0000ffffa635042c).0x0000ffff24170460 (shared)
fcache_enter = 0x0000ffff22d10b80, target = 0x0000ffff2417045c

main_signal_handler: thread=1588266, sig=11, xsp=0x0000fff923c94da0, retaddr=0x000000000000000b
siginfo: sig = 11, pid = 264, status = 0, errno = 0, si_code = 1
        x0     = 0x0000000000000000
        x1     = 0x0000fff923c6e000
        x2     = 0x000000000000000c
        x3     = 0x0000000000000030
        x4     = 0x000000000000005c
        x5     = 0x0000000000003c05
        x6     = 0x0000fff09015a9b8
        x7     = 0xfefeff6f6071735e
        x8     = 0x7f7f7f7f7f7f7f7f
        x9     = 0x0000000000000000
        x10    = 0x0101010101010101
        x11    = 0x0000000000000028
        x12    = 0x0000a701409d1276
        x13    = 0x0000000000000040
        x14    = 0x000000000000003f
        x15    = 0x0000000000000000
        x16    = 0x0000ffffa651dc00
        x17    = 0x0000ffffa6bc4080
        x18    = 0x0000000000000000
        x19    = 0x0000000000000030
        x20    = 0x0000fff090484208
        x21    = 0x0000fff0901b94f8
        x22    = 0x0000ffffa6600340
        x23    = 0x0000000000000001
        x24    = 0x0000000000000021
        x25    = 0x0000fff09045a8f8
        x26    = 0x0000000000000021
        x27    = 0x0000000000000108
        x28    = 0x0000000000000021
        x29    = 0x0000fff106572880
        x30    = 0x0000ffffa635068c
        sp     = 0x0000fff106572880
        pc     = 0x0000ffff2417046c
        pstate = 0x0000000020000000
dcontext next tag = 0x0000ffff2417045c
computing memory target for 0x0000ffff2417046c causing SIGSEGV, kernel claims it is 0x0000000000000108
compute_memory_target: falling back to racy protection checks
opnd_compute_address for: (%x0,%x28,lsl #3)
        base => 0x0000000000000000
        index,scale => 0x0000000000000108
        disp => 0x0000000000000108
For SIGSEGV at cache pc 0x0000ffff2417046c, computed target read 0x0000000000000108
        faulting instr: ldr    (%x0,%x28,lsl #3)[8byte] -> %x0
** Received SIGSEGV at cache pc 0x0000ffff2417046c in thread 1588266
record_pending_signal(11) from cache pc 0x0000ffff2417046c
        not certain can delay so handling now
        action is not SIG_IGN
translate context, thread 1588266 at pc_recreatable spot translating

kuhanov commented 2 years ago

Hi, @derekbruening We are now ready and want to contribute changes that unblock usage of DynamoRIO for JVM workloads. Internal company approval process was not so easy for us. :) In order to do that we need a public branch at official DynamoRIO repository e.g. like i3733-bug-fixes. We will be using the banch to deliver our commits into it and then send pull requests from that branch. Cloud you please help with that and create such a branch for i3733 bug fixes? Thx, Kirill

derekbruening commented 2 years ago

That is great news. I've sent you an invite for commit privileges so you can create your own branches. Normally we create a new temporary branch for each PR.

prasun3 commented 1 year ago

@kuhanov I was curious if you are still planning to contribute your changes

kuhanov commented 1 year ago

Hi. In general we switched to drcachesim collector. It is more stable and provide offline data for analysis. Overhead is also much less against our online collectors. One point that we probably try to improve is speedup of raw data to trace (drraw2trace tool). Currently it takes a lot of time. Thx, Kirill

derekbruening commented 1 year ago

But weren't all the issues you hit and the fixes you were going to contribute relating to the core of DR and so would be present in the drcachesim drmemtrace tracer too?

kuhanov commented 1 year ago

But weren't all the issues you hit and the fixes you were going to contribute relating to the core of DR and so would be present in the drcachesim drmemtrace tracer too?

ok. I looked in our branch and investigated what we have for core and ext in our local branch

There are 3 types of patches: fatures for enabling instruction mix, we added categories for grouping instructions:

Instruction categories.
Added instruction group types for AARch64 for instruction mix splitting

fixes:

Added guard for readlink syscall
Restore SSE register context during signals. They were not hadled correctly and were empty

Workarounds. This is not product solution but these unblocked us for collecting data for java (we had limited resources to invetigate that deeper)

[AArch64][jdk8] Incorrect handling synchro signal in case mangling epilogue pc. Added workaround
FAKE_TAG: remove setting pending_delete_pc for unlinked fragments. Prepared for remove fragment goes to execution on some reason. So, we need to stay it in tables.
Fix debug assert. AArch64 java threads have huge waiting time. Increase value to allow them to do that
Remove assert from debug and return false if start and end app_pc is not defined. We caught such issues on AArch64 with JVM profiling
Fixed incorrect mcontext restoring when signal have recieved in cti_short executing

I suppose, we could share these patches, maybe these could be added to DRIO project backlog

Thanks, Kirill

derekbruening commented 1 year ago

Please share any bug fixes: otherwise someone else may hit the same problem and spend essentially wasted time re-debugging and re-fixing what is already sitting fixed in a private branch somewhere which is not a good situation. We ourselves may start running Java in the future and would not want to have to re-discover and re-fix all these things.

kuhanov commented 1 year ago

ok. I'll prepare review requests to have ability to link on them. maybe is there better way to share our patches? Kirill

derekbruening commented 1 year ago

ok. I'll prepare review requests to have ability to link on them. maybe is there better way to share our patches? Kirill

Thank you. I think a PR is good even for the ones labeled workarounds where you're not sure if it's the proper long-term approach.

kuhanov commented 1 year ago

ok. I'll prepare review requests to have ability to link on them. maybe is there better way to share our patches? Kirill

Thank you. I think a PR is good even for the ones labeled workarounds where you're not sure if it's the proper long-term approach.

https://github.com/DynamoRIO/dynamorio/tree/i3733-jvm-bug-workarounds

derekbruening commented 1 year ago

Thanks. At a quick glance we have 8 changes:

https://github.com/DynamoRIO/dynamorio/tree/i3733-jvm-bug-workarounds

thread_set_self_mcontext not setting SIMD state: but it looks like thread_set_self_context() sets the fpstate pointer and calls save_fpstate(), clobbering the mcontext fpstate you're trying to have written here? Probably I'm missing something?
Incorrect xl8 of mangling epilogue pc
FAKE_TAG: I don't understand this one
bump assert on sleep
vmareas.c binary_search is passed start or end==NULL
drbbdup_event_restore_state not handling cti_short_rewrite => what ab drreg restore state
raw2trace kernel marker if not in module but in JIT code => I fixed in Apr'23 in PR #6001
raw2trace "temp workaround": not sure I understand it at first glance