DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.61k stars 554 forks source link

Invariant failure "Syscall marker not preceded by timestamp + cpuid" on win32 #6599

Closed derekbruening closed 5 months ago

derekbruening commented 7 months ago

The win32 long (post-merge) test is now failing every time. I grabbed 3 cases and they all have the same failure:

$ tail -18 logs*/vs2019-32/6_Run\ Suite.txt | egrep -v 'ignore|Process'
==> logs/vs2019-32/6_Run Suite.txt <==
2024-01-26T04:44:07.8320478Z ====> FAILURE in debug-internal-32 <====
2024-01-26T04:44:07.8321455Z debug-internal-32: 556 tests passed, **** 15 tests failed, but ignoring 12 for i#2145: ****
2024-01-26T04:44:07.8324403Z    code_api|tool.drcacheoff.windows-invar 
2024-01-26T04:44:07.8328496Z    code_api|tool.drcacheoff.invariant_checker 
2024-01-26T04:44:07.8329258Z    code_api|tool.drcacheoff.getretaddr_record_replace_retaddr 

==> logs2/vs2019-32/6_Run Suite.txt <==
2024-01-26T18:56:04.5217253Z ====> FAILURE in debug-internal-32 <====
2024-01-26T18:56:04.5218497Z debug-internal-32: 556 tests passed, **** 15 tests failed, but ignoring 12 for i#2145: ****
2024-01-26T18:56:04.5221822Z    code_api|tool.drcacheoff.windows-invar 
2024-01-26T18:56:04.5226724Z    code_api|tool.drcacheoff.invariant_checker 
2024-01-26T18:56:04.5227588Z    code_api|tool.drcacheoff.getretaddr_record_replace_retaddr 

==> logs3/vs2019-32/6_Run Suite.txt <==
2024-01-26T06:24:10.3661901Z ====> FAILURE in debug-internal-32 <====
2024-01-26T06:24:10.3662896Z debug-internal-32: 556 tests passed, **** 15 tests failed, but ignoring 12 for i#2145: ****
2024-01-26T06:24:10.3666076Z    code_api|tool.drcacheoff.windows-invar 
2024-01-26T06:24:10.3670281Z    code_api|tool.drcacheoff.invariant_checker 
2024-01-26T06:24:10.3671114Z    code_api|tool.drcacheoff.getretaddr_record_replace_retaddr 

All 3 tests are hitting this invariant failure:

logs3/vs2019-32/6_Run Suite.txt-2024-01-26T06:14:29.9054339Z 277:   *** postcmd failed (3): Trace invariant failure in T6756 at ref # 152741 (1
logs3/vs2019-32/6_Run Suite.txt-2024-01-26T06:14:29.9134544Z 277:   instrs since timestamp 2104499617): Syscall marker not preceded by
logs3/vs2019-32/6_Run Suite.txt-2024-01-26T06:14:29.9151299Z 277:   timestamp + cpuid

This may be related to AMD vsyscall like #6417.

derekbruening commented 6 months ago
291: CMake Error at D:/derek/dr/git/src/suite/tests/runmulti.cmake:114 (message):
291:   *** postcmd failed (3): Trace invariant failure in T13340 at ref # 295 (1
291:   instrs since timestamp 724193316): Syscall marker not preceded by timestamp
291:   + cpuid

=>

         288         162:       13340 ifetch       5 byte(s) @ 0x774c4a45 ba 30 8a 3d 01       mov    $0x013d8a30, %edx
         289         162:       13340 <marker: timestamp 724193316>
         290         162:       13340 <marker: tid 13340 on core 22>
         291         162:       13340 <marker: timestamp 724193316>
         292         162:       13340 <marker: tid 13340 on core 22>
         293         163:       13340 ifetch       2 byte(s) @ 0x774c4a4a ff d2                call   %edx (target 0x774c4a4c)
         294         163:       13340 write        4 byte(s) @ 0x004ffa90 by PC 0x774c4a4a
         295         163:       13340 <marker: system call 491>
         296         164:       13340 ifetch       3 byte(s) @ 0x774c4a4c c2 04 00             ret    $0x0004 (target 0x7542b5be)
         297         164:       13340 read         4 byte(s) @ 0x004ffa94 by PC 0x774c4a4c
         298         165:       13340 ifetch       2 byte(s) @ 0x7542b5be 33 c9                xor    %ecx, %ecx

The ordering is all weird. Xref #5949 but that's an offline issue.

xdje42 commented 6 months ago

Was this on a windows machine you have access to? [seems so but didn't want to assume] And what did you type (and/or otherwise do) to obtain the above failure and corresponding trace snippet?

derekbruening commented 6 months ago

That is from the view tool on the test trace in the invariant thread and ref# from the test failure as cited.

xdje42 commented 6 months ago

Data point: The tool.record_filter test is also failing the same way on win32: "Syscall marker not preceded by timestamp + cpuid" This happens in post-merge test runs - tool.record_filter is not run in pre-merge test runs.

derekbruening commented 5 months ago

I believe what is happening is raw2trace is delaying the call* == "syscall" across the 2nd timestamp;cpuid pair.

To avoid that we really want to solve #5949 so we can actually identify this call* as a syscall in standalone decoding.

For now let's just relax this invariant on WOW64.