DynamoRIO / drmemory

Memory Debugger for Windows, Linux, Mac, and Android
Other
2.46k stars 263 forks source link

CRASH (unit_tests new alloc memset to zero on unwritable page) #1743

Open derekbruening opened 9 years ago

derekbruening commented 9 years ago

Hit on Chromium bots: https://code.google.com/p/chromium/issues/detail?id=514921

Crash info:

[----------] 1 test from TaskManagerTest
[ RUN      ] TaskManagerTest.Resources
<Application E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\out\Release\unit_tests.exe (2604).  Dr. Memory internal crash at PC 0x7384b5e1.  Please report this at http://drmemory.org/issues.  Program aborted.
0xc0000005 0x00000000 0x7384b5e1 0x7384b5e1 0x00000001 0x54130000
Base: 0x66510000
Registers: eax=0x00000004 ebx=0x00000000 ecx=0x54130000 edx=0x00000060
    esi=0x0001ffe8 edi=0x54110040 esp=0x236aef10 ebp=0x236aef5c
    eflags=0x000
1.8.16561-105909314-(May  6 2015 08:37:57) win61
-no_dynamic_options -disasm_mask 8 -logdir 'C:\Users\chrome-bot\AppData\LocalLow\vg_logs_zz8m5k\dynamorio' -client_lib 'E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\third_party\drmemory\unpacked\bin\release\drmemorylib.dll;0;-suppress `E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\tools\valgr
0x236aef5c 0x7382f8fb
0x236aefd8 0x665d914b
0x003ae854 0x003ae9e4
0x23669ac0 0xfffddfca
E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\third_party\drmemory\unpacked\bin\release\drmemorylib.dll=0x73800000
E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\third_party\drmemory\unpacked\bin\release/dbghelp.dll=0x663c0000
C:\Windows/system32/msvcrt.dll=0x06d20000
C:\Windows/system32/KERNEL32.dll=0x06e40000
C:\Windows/system32/KERNELBASE.dll=0x06920000>
~~Dr.M~~ WARNING: application exited with abnormal code 0xffffffff

Also crashed on TaskManagerTest.RefreshCalled:

[----------] 1 test from TaskManagerTest
[ RUN      ] TaskManagerTest.RefreshCalled
<Application E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\out\Release\unit_tests.exe (3424).  Dr. Memory internal crash at PC 0x06dfb5e1.  Please report this at http://drmemory.org/issues.  Program aborted.
0xc0000005 0x00000000 0x06dfb5e1 0x06dfb5e1 0x00000001 0x543e0000
Base: 0x66830000
Registers: eax=0x00000004 ebx=0x00000000 ecx=0x543e0000 edx=0x00000060
    esi=0x0001ffe8 edi=0x543c0040 esp=0x1830ef10 ebp=0x1830ef5c
    eflags=0x000
1.8.16561-105909314-(May  6 2015 08:37:57) win61
-no_dynamic_options -disasm_mask 8 -logdir 'C:\Users\chrome-bot\AppData\LocalLow\vg_logs_wnt0dv\dynamorio' -client_lib 'E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\third_party\drmemory\unpacked\bin\release\drmemorylib.dll;0;-suppress `E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\tools\valgr
0x1830ef5c 0x06ddf8fb
0x1830efd8 0x668f914b
0x0030e9d4 0x0030eb64
0x182c7740 0xfffddfca
E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\third_party\drmemory\unpacked\bin\release\drmemorylib.dll=0x06db0000
E:\b\build\slave\chromium-dbg-win-drmemory-full-3\build\src\third_party\drmemory\unpacked\bin\release/dbghelp.dll=0x666e0000
C:\Windows/system32/msvcrt.dll=0x06c70000
C:\Windows/system32/KERNEL32.dll=0x06f90000
C:\Windows/system32/KERNELBASE.dll=0x06790000>
~~Dr.M~~ WARNING: application exited with abnormal code 0xffffffff

These tests themselves are not new and have not been changed recently.

This crash is non-deterministic: went away and came back on bot #3, where builds 7377-73780 are green (and the other 2 bots that shard unit_tests don't have the crash then).

This bot has been purple a lot. I actually see this crash further back: builds 7330 and 7332 on bot #3 from July 23. The crashes could go back even further than that.

Logging in to the bot and running just this test or all 3 TaskManagerTest.* using the same args as the scripts: the tests run just fine with no crash.


Is it symbol cache corruption (https://github.com/DynamoRIO/drmemory/issues/1465) which can cause weird crashes? On the bot in the AppData/LocalLow/drmemory.symcache directory: $ grep 157db * msvcrt.dll.txt:_CrtDbgReport,0x157db msvcrt.dll.txt:_CrtDbgReportW,0x157db msvcrt.dll.txt:_CrtDbgReportV,0x157db msvcrt.dll.txt:_CrtDbgReportWV,0x157db msvcrt.dll.txt:_CrtSetDbgFlag,0x157db msvcrt.dll.txt:_crtDbgFlag,0x157db

So doesn't look like it. I made a copy of the original symcache dir on the bot (bug_514921/) and cleared out the old one just in case.


Symbolizing the crash call stack:

$ bin/symquery.exe -e bin/release/drmemorylib.dll -f -a 0x4b5e1 0x2f8fb
_VEC_memzero+0x60
??:0
replace_realloc_common+0x29b
e:\b\build\slave\win-builder\drmemory\common\alloc_replace.c:2008+0x0
$ bin/symquery.exe -e dynamorio/lib32/release/dynamorio.dll -f -a 0xc914b
dr_call_on_clean_stack+0x60
E:\b\build\slave\win-builder\build\build_drmemory-release-32\dynamorio\core\x86.asm_core.s:1531+0x0

drmemorylib!_VEC_memzero+0x60:
7384b5e1 660f7f01        movdqa  xmmword ptr [ecx],xmm0
7384b5e5 8d4910          lea     ecx,[ecx+10h]
7384b5e8 48              dec     eax
7384b5e9 75f6            jne     drmemorylib!_VEC_memzero+0x60 (7384b5e1)

0:000> U 0x7382f8fb-5
drmemorylib!replace_realloc_common+0x296 [e:\b\build\slave\win-builder\drmemory\common\alloc_replace.c @ 2007]:
7382f8f6 e825f6ffff      call    drmemorylib!replace_alloc_common (7382ef20)

drmemorylib!replace_alloc_common+0x274 [e:\b\build\slave\win-builder\drmemory\common\alloc_replace.c @ 1698]:
 1698 7382f194 8b55f4          mov     edx,dword ptr [ebp-0Ch]
 1698 7382f197 56              push    esi
 1698 7382f198 6a00            push    0
 1698 7382f19a 52              push    edx
 1698 7382f19b e880b60100      call    drmemorylib!memset (7384a820)
 1698 7382f1a0 83c40c          add     esp,0Ch

memset calls _VEC_memzero if size!=0, val=0, size<0x80, and sse2 is avail:

0:000> Uf drmemorylib!memset
drmemorylib!memset [f:\dd\vctools\crt_bld\SELF_X86\crt\src\INTEL\memset.asm @ 59]:
   59 7384a820 8b54240c        mov     edx,dword ptr [esp+0Ch]
   69 7384a824 8b4c2404        mov     ecx,dword ptr [esp+4]
   71 7384a828 85d2            test    edx,edx
   72 7384a82a 7469            je      drmemorylib!memset+0x75 (7384a895)
drmemorylib!memset+0xc [f:\dd\vctools\crt_bld\SELF_X86\crt\src\INTEL\memset.asm @ 74]:
   74 7384a82c 33c0            xor     eax,eax
   75 7384a82e 8a442408        mov     al,byte ptr [esp+8]
   78 7384a832 84c0            test    al,al
   79 7384a834 7516            jne     drmemorylib!memset+0x2c (7384a84c)
drmemorylib!memset+0x16 [f:\dd\vctools\crt_bld\SELF_X86\crt\src\INTEL\memset.asm @ 80]:
   80 7384a836 81fa80000000    cmp     edx,80h
   81 7384a83c 720e            jb      drmemorylib!memset+0x2c (7384a84c)
drmemorylib!memset+0x1e [f:\dd\vctools\crt_bld\SELF_X86\crt\src\INTEL\memset.asm @ 82]:
   82 7384a83e 833d94ed9c7300  cmp     dword ptr [drmemorylib!__sse2_available (739ced94)],0
   83 7384a845 7405            je      drmemorylib!memset+0x2c (7384a84c)
drmemorylib!memset+0x27 [f:\dd\vctools\crt_bld\SELF_X86\crt\src\INTEL\memset.asm @ 85]:
   85 7384a847 e9350d0000      jmp     drmemorylib!_VEC_memzero (7384b581)

Passes ptr in ecx, size in edx.

So it's this line:

    if (TEST(ALLOC_ZERO, flags))
        memset(res, 0, request_size);

Failed to write to a new page (0x00000001 0x54130000). Still has eax=4 * 16 = 64 bytes (+ more if non-16-aligned) left to zero. Original size edx=0x60? but that's <0x80. edx could be modified before crash if orig ptr not aligned to 16.

Very strange: if the allocator really messes up this badly and has an unwritable page in the middle of a new alloc, wouldn't we see a lot more problems? Is there really some crazy free list corner case that's this rare? We haven't updated DrMem in a while so there was some change in Cr that changed its alloc pattern to suddenly trigger this weird bug?


I'm still trying to reproduce locally: running individual tests in a loop had no success so I am now running this same set of unit_tests subtests sharded in a loop. No repro so far.

derekbruening commented 9 years ago

Locally, running the whole shard 10x in a loop => no repro. Running individual tests 28x => no repro. On the bot, running individual tests 20x => no repro.

Lowering priority to medium b/c we can't reproduce it.

derekbruening commented 9 years ago

Could DR's cache cons have made this page readonly? Xref #1354