DynamoRIO / drmemory

Memory Debugger for Windows, Linux, Mac, and Android
Other
2.45k stars 262 forks source link

APP CRASH in sql_unittests on win7-cr bot due to incorrect symcache entries #1427

Open derekbruening opened 9 years ago

derekbruening commented 9 years ago

From bruen...@google.com on January 28, 2014 21:29:05

On our win{7,8}-cr bots on the DrMemory waterfall we have some failures.

I looked into one on win7-cr in sql_unittests:

$ export RUNNING_ON_VALGRIND=1 $ export DRMEMORY_COMMAND=/cygdrive/e/b/build/slave/win7-cr/build/unpacked/bin/drmemory.exe $ ../../win7-cr-builder/build/src/tools/valgrind/chrome_tests.sh -t sql --tool drmemory ... [ RUN ] SQLConnectionTest.Execute Backtrace: (No symbol) [0x01A7E069] SHPathPrepareForWriteW [0x75DF5C81+1902] Ordinal201 [0x75DA9461+2849] Ordinal201 [0x75DAAB99+8793] IsUserAnAdmin [0x75DF4DE5+2408] SHFileOperationW [0x75DF97EF+231] SHFileOperationW [0x75DF9717+15] base::DeleteFileW 0x007429B5+437 ... Dr.M WARNING: application exited with abnormal code 0xc000001d

$ unpacked/bin/drmemory.exe -version Dr. Memory version 1.6.1686 -- build 42

Yet running it without going through the script works fine:

$ unpacked/bin/drmemory.exe -light ../../win7-cr-builder/build/src/out/Debug/sql_unittests.exe --gtest_filter=SQLConnectionTest.Execute --single-process-tests Dr.M Dr. Memory version 1.6.1686 Dr.M Running "../../win7-cr-builder/build/src/out/Debug/sql_unittests.exe --gtest_filter=SQLConnectionTest.Execute --single-process-tests" Valgrind detected, switching to single process mode. Pass --test-launcher-debug-launcher to valgrind the launcher itself. Note: Google Test filter = SQLConnectionTest.Execute [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from SQLConnectionTest [ RUN ] SQLConnectionTest.Execute [ OK ] SQLConnectionTest.Execute (1029 ms) [----------] 1 test from SQLConnectionTest (1029 ms total)

[----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (1061 ms total) [ PASSED ] 1 test. Dr.M Dr.M NO ERRORS FOUND: Dr.M 0 unique, 0 total unaddressable access(es) Dr.M 0 unique, 0 total invalid heap argument(s) Dr.M 0 unique, 0 total GDI usage error(s) Dr.M 0 unique, 0 total warning(s) Dr.M ERRORS IGNORED: Dr.M Details: E:\b\build\slave\win7-cr\build\unpacked\drmemory\logs\DrMemory-sql_unittests.exe.3180.000\results.txt

I can repro with:

$ unpacked/bin/drmemory.exe -symcache_dir C:/Users//AppData/LocalLow/drmemory.symcache -batch -no_count_leaks -no_leak_scan -light -no_callstack_use_top_fp_selectively -- ../../win7-cr-builder/build/src/out/Debug/sql_unittests.exe --gtest_filter=SQLConnectionTest.Execute --single-process-tests Dr.M Dr. Memory version 1.6.1686 Dr.M Running "../../win7-cr-builder/build/src/out/Debug/sql_unittests.exe --gtest_filter=SQLConnectionTest.Execute --single-process-tests" Note: Google Test filter = SQLConnectionTest.Execute [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from SQLConnectionTest [ RUN ] SQLConnectionTest.Execute Backtrace: (No symbol) [0x01C2DF56] SHPathPrepareForWriteW [0x759C5C81+1902] Ordinal201 [0x75979461+2849] Ordinal201 [0x7597AB99+8793] IsUserAnAdmin [0x759C4DE5+2408] SHFileOperationW [0x759C97EF+231] SHFileOperationW [0x759C9717+15] base::DeleteFileW 0x007B29B5+437 ...

-no_use_symcache => works fine

So it's a problem with a symcache file.

I made a copy: $ rsync -av drmemory.symcache drmemory.BUGCOPY.symcache

Binary search => it's SHELL32.dll.txt

$ cat SHELL32.dll.txt Dr. Memory symbol cache version 11 408,12873728,2139568001799028737,2139568001799028737,12906023,1339216345,12886016 0 operator delete,0x6b0da std::_DebugHeapDelete<>,0x0 Concurrency::details::Security::EncodePointer,0x0 operator delete[] nothrow,0x0 operator delete[],0x6b0e0 operator delete nothrow,0x0 operator new nothrow,0x0 operator new,0x6b296 operator new[] nothrow,0x0 operator new[],0x6b090

$ ls -l ../drmemory.BUGCOPY.symcache/drmemory.symcache/SHELL32.dll.txt SHELL32.dll.txt -rwx------ 1 408 Aug 22 18:26 ../drmemory.BUGCOPY.symcache/drmemory.symcache/SHELL32.dll.txt -rwxr-xr-x 1 392 Jan 28 18:09 SHELL32.dll.txt $ diff ../drmemory.BUGCOPY.symcache/drmemory.symcache/SHELL32.dll.txt SHELL32.dll.txt 2c2 < 408,12873728,2139568001799028737,2139568001799028737,12906023,1339216345,12886016


   392,12873728,2139568001799028737,2139568001799028737,12906023,1339216345,12886016

4c4 < operator delete,0x6b0da


operator delete,0x0 8c8 < operator delete[],0x6b0e0


operator delete[],0x0 11c11 < operator new,0x6b296


operator new,0x0 13c13 < operator new[],0x6b090


operator new[],0x0

So the bad one has entries for operators while the newly generated one does not.

module load event: "SHELL32.dll" 0x75970000-0x765ba000 modid: 40 C:\Windows\SysWOW64\shell32.dll module SHELL32.dll imports from msvc* so not searching inside it module SHELL32.dll imports from msvc* so not searching inside it WARNING: unable to load symbols for C:\Windows\SysWOW64\shell32.dll sym lookup of shell32!operator new* in C:\Windows\SysWOW64\shell32.dll => 0 0x00000000 marking operator new as processed since regex didn't match marking operator new[] as processed since regex didn't match marking operator new nothrow as processed since regex didn't match marking operator new[] nothrow as processed since regex didn't match sym lookup of shell32!operator delete* in C:\Windows\SysWOW64\shell32.dll => 0 0x00000000 marking operator delete as processed since regex didn't match marking operator delete[] as processed since regex didn't match marking operator delete nothrow as processed since regex didn't match marking operator delete[] nothrow as processed since regex didn't match sym lookup of shell32!std::_DebugHeapDelete<*> in C:\Windows\SysWOW64\shell32.dll => 0 0x00000000 sym lookup of shell32!std::_DebugHeapDelete<> in C:\Windows\SysWOW64\shell32.dll => 5 0x00000000 sym lookup of shell32!Concurrency::details::Security::EncodePointer in C:\Windows\SysWOW64\shell32.dll => 0 0x00000000 post-module-load @0:00:01.903 in thread 2092

With symbols (on another machine, w/ that shell32.dll):

0:000> lm start end module name 73800000 7444a000 shell32 (pdb symbols) c:\src\symbols\shell32.pdb\1489BCD8E80C441E91308A093631DE5E2\shell32.pdb 0:000> x shell32!operator new 738ac806 shell32!operator new[](no parameter info) 738aa97d shell32!operator new () 0:000> x shell32!operator delete 738ac788 shell32!operator delete[](no parameter info) 738ac374 shell32!operator delete () 738adf73 shell32!operator delete ()

So those values are incorrect.

Did we ever have a pdb on that bot? Why would it have found those operator symbols at all?

Original issue: http://code.google.com/p/drmemory/issues/detail?id=1427

derekbruening commented 9 years ago

From bruen...@google.com on February 01, 2014 08:44:46

xref issue #1313 , issue #1428 , issue #1430

derekbruening commented 9 years ago

From bruen...@google.com on July 07, 2014 13:23:53

We have more crashes on the win7-cr bot due to symcache corruption. I'm documenting it in this case even though it's not sql_.

The Buildbot has detected a new failure on builder win7-cr while building DrMemory. Full details are available at: http://build.chromium.org/p/client.drmemory/builders/win7-cr/builds/8846 These crashes are happening every time on this bot:

base: [ RUN ] SafeSPrintfTest.Padding Valgrind detected, switching to single process mode. Pass --test-launcher-debug-launcher to valgrind the launcher itself. Dr.M WARNING: application exited with abnormal code 0xc0000005

net: [ RUN ] NetworkChangeNotifierWinTest.NetChangeWinBasic <Application E:\b\build\slave\win7-cr-builder\build\src\out\Debug\net_unittests.exe (2396). Client exception at PC 0x05d88288. Program aborted. 0xc0000005 0x00000000 0x05d88288 0x05d88288 0x00000000 0x00000003

unit: [ RUN ] HistoryURLProviderTest.EmptyVisits Dr.M WARNING: application exited with abnormal code 0xc0000005

We have symcache error reports from r1970 :

Dr.M SYMCACHE ERROR: module GDI32.dll (base @0x754a0000) with entry _malloc_dbg (offset @0x000077b2) is not accessible! Dr.M SYMCACHE ERROR: module GDI32.dll (base @0x754a0000) with entry operator new (offset @0x00006c42) is not accessible! Dr.M SYMCACHE ERROR: module GDI32.dll (base @0x754a0000) with entry operator delete (offset @0x00006c48) is not accessible! Dr.M SYMCACHE ERROR: module KERNEL32.dll (base @0x75670000) with entry _malloc_dbg (offset @0x00269ac8) is not accessible! Dr.M SYMCACHE ERROR: module IMM32.dll (base @0x75810000) with entry _malloc_dbg (offset @0x00032852) is not accessible! Dr.M SYMCACHE ERROR: module IMM32.dll (base @0x75810000) with entry operator new (offset @0x00031ad0) is not accessible! Dr.M SYMCACHE ERROR: module IMM32.dll (base @0x75810000) with entry operator new[](offset @0x000319b0) is not accessible! Dr.M SYMCACHE ERROR: module IMM32.dll (base @0x75810000) with entry operator delete (offset @0x00031ad6) is not accessible! Dr.M SYMCACHE ERROR: module IMM32.dll (base @0x75810000) with entry operator delete[](offset @0x00031e82) is not accessible! Dr.M SYMCACHE ERROR: module IMM32.dll (base @0x75810000) with entry operator new nothrow (offset @0x00031e90) is not accessible! Dr.M SYMCACHE ERROR: module USER32.dll (base @0x75e40000) with entry _malloc_dbg (offset @0x00006ff8) is not accessible! Dr.M SYMCACHE ERROR: module USER32.dll (base @0x75e40000) with entry operator new (offset @0x00006316) is not accessible! Dr.M SYMCACHE ERROR: module USER32.dll (base @0x75e40000) with entry operator delete (offset @0x0000631c) is not accessible! Dr.M SYMCACHE ERROR: module PSAPI.DLL (base @0x76320000) with entry _malloc_dbg (offset @0x00006ff8) is not accessible! Dr.M SYMCACHE ERROR: module NSI.dll (base @0x77ca0000) with entry _malloc_dbg (offset @0x000175da) is not accessible! r1970 should lead to bogus symcache entries being replaced as it will fall back to a drsyms lookup. We can still have crashes from readable but still wrong entries, but shouldn't the error entries go away from one run to the next?

derekbruening commented 9 years ago

From bruen...@google.com on July 17, 2014 12:04:49

Xref more diagnostic steps taken for symcache corruption in issue #1465 *\ TODO now wait for win7-cr bot to hit one of the diagnostics

based on where the printout is (at exit?) try and deduce further: or go and ask for ldmp on dr_abort and wait again

hmm, win7-cr is failing on unit_tests light: http://build.chromium.org/p/client.drmemory/builders/win7-cr/builds/9096/steps/Chromium%20%27unit%27%20tests/logs/stdio [ RUN ] DefaultAppsTest.Install [2428:3732:0715/142633:8503411:FATAL:first_run_internal_win.cc(155)] Check failed: migrated. : The parameter is incorrect. (0x57) Backtrace:

MFYI bot works though: http://build.chromium.org/p/chromium.memory.fyi/builders/Windows%20Unit%20%28DrMemory%29/builds/17773/steps/memory%20test%3A%20unit/logs/stdio [----------] 1 test from DefaultAppsTest [ RUN ] DefaultAppsTest.Install [ OK ] DefaultAppsTest.Install (94 ms) [----------] 1 test from DefaultAppsTest (94 ms total)

locally that test works for me, isolated or in full unit_ light run I blew the symcache away gain after build #9105