Open derekbruening opened 9 years ago
From bruen...@google.com on February 01, 2014 08:44:46
xref issue #1313 , issue #1428 , issue #1430
From bruen...@google.com on July 07, 2014 13:23:53
We have more crashes on the win7-cr bot due to symcache corruption. I'm documenting it in this case even though it's not sql_.
The Buildbot has detected a new failure on builder win7-cr while building DrMemory. Full details are available at: http://build.chromium.org/p/client.drmemory/builders/win7-cr/builds/8846 These crashes are happening every time on this bot:
base:
[ RUN ] SafeSPrintfTest.Padding
Valgrind detected, switching to single process mode.
Pass --test-launcher-debug-launcher to valgrind the launcher itself.
Dr.M WARNING: application exited with abnormal code 0xc0000005
net: [ RUN ] NetworkChangeNotifierWinTest.NetChangeWinBasic <Application E:\b\build\slave\win7-cr-builder\build\src\out\Debug\net_unittests.exe (2396). Client exception at PC 0x05d88288. Program aborted. 0xc0000005 0x00000000 0x05d88288 0x05d88288 0x00000000 0x00000003
unit:
[ RUN ] HistoryURLProviderTest.EmptyVisits
Dr.M WARNING: application exited with abnormal code 0xc0000005
We have symcache error reports from r1970 :
Dr.M SYMCACHE ERROR: module GDI32.dll (base @0x754a0000) with entry _malloc_dbg (offset @0x000077b2) is not accessible!
Dr.M SYMCACHE ERROR: module GDI32.dll (base @0x754a0000) with entry operator new (offset @0x00006c42) is not accessible!
Dr.M SYMCACHE ERROR: module GDI32.dll (base @0x754a0000) with entry operator delete (offset @0x00006c48) is not accessible!
Dr.M SYMCACHE ERROR: module KERNEL32.dll (base @0x75670000) with entry _malloc_dbg (offset @0x00269ac8) is not accessible!
Dr.M SYMCACHE ERROR: module IMM32.dll (base @0x75810000) with entry _malloc_dbg (offset @0x00032852) is not accessible!
Dr.M SYMCACHE ERROR: module IMM32.dll (base @0x75810000) with entry operator new (offset @0x00031ad0) is not accessible!
Dr.M SYMCACHE ERROR: module IMM32.dll (base @0x75810000) with entry operator new[](offset @0x000319b0) is not accessible!
Dr.M SYMCACHE ERROR: module IMM32.dll (base @0x75810000) with entry operator delete (offset @0x00031ad6) is not accessible!
Dr.M SYMCACHE ERROR: module IMM32.dll (base @0x75810000) with entry operator delete[](offset @0x00031e82) is not accessible!
Dr.M SYMCACHE ERROR: module IMM32.dll (base @0x75810000) with entry operator new nothrow (offset @0x00031e90) is not accessible!
Dr.M SYMCACHE ERROR: module USER32.dll (base @0x75e40000) with entry _malloc_dbg (offset @0x00006ff8) is not accessible!
Dr.M SYMCACHE ERROR: module USER32.dll (base @0x75e40000) with entry operator new (offset @0x00006316) is not accessible!
Dr.M SYMCACHE ERROR: module USER32.dll (base @0x75e40000) with entry operator delete (offset @0x0000631c) is not accessible!
Dr.M SYMCACHE ERROR: module PSAPI.DLL (base @0x76320000) with entry _malloc_dbg (offset @0x00006ff8) is not accessible!
Dr.M SYMCACHE ERROR: module NSI.dll (base @0x77ca0000) with entry _malloc_dbg (offset @0x000175da) is not accessible! r1970 should lead to bogus symcache entries being replaced as it will fall back to a drsyms lookup. We can still have crashes from readable but still wrong entries, but shouldn't the error entries go away from one run to the next?
From bruen...@google.com on July 17, 2014 12:04:49
Xref more diagnostic steps taken for symcache corruption in issue #1465 *\ TODO now wait for win7-cr bot to hit one of the diagnostics
based on where the printout is (at exit?) try and deduce further: or go and ask for ldmp on dr_abort and wait again
hmm, win7-cr is failing on unit_tests light: http://build.chromium.org/p/client.drmemory/builders/win7-cr/builds/9096/steps/Chromium%20%27unit%27%20tests/logs/stdio [ RUN ] DefaultAppsTest.Install [2428:3732:0715/142633:8503411:FATAL:first_run_internal_win.cc(155)] Check failed: migrated. : The parameter is incorrect. (0x57) Backtrace:
MFYI bot works though: http://build.chromium.org/p/chromium.memory.fyi/builders/Windows%20Unit%20%28DrMemory%29/builds/17773/steps/memory%20test%3A%20unit/logs/stdio [----------] 1 test from DefaultAppsTest [ RUN ] DefaultAppsTest.Install [ OK ] DefaultAppsTest.Install (94 ms) [----------] 1 test from DefaultAppsTest (94 ms total)
locally that test works for me, isolated or in full unit_ light run I blew the symcache away gain after build #9105
From bruen...@google.com on January 28, 2014 21:29:05
On our win{7,8}-cr bots on the DrMemory waterfall we have some failures.
I looked into one on win7-cr in sql_unittests:
$ export RUNNING_ON_VALGRIND=1 $ export DRMEMORY_COMMAND=/cygdrive/e/b/build/slave/win7-cr/build/unpacked/bin/drmemory.exe $ ../../win7-cr-builder/build/src/tools/valgrind/chrome_tests.sh -t sql --tool drmemory ... [ RUN ] SQLConnectionTest.Execute Backtrace: (No symbol) [0x01A7E069] SHPathPrepareForWriteW [0x75DF5C81+1902] Ordinal201 [0x75DA9461+2849] Ordinal201 [0x75DAAB99+8793] IsUserAnAdmin [0x75DF4DE5+2408] SHFileOperationW [0x75DF97EF+231] SHFileOperationW [0x75DF9717+15] base::DeleteFileW 0x007429B5+437 ...
Dr.MWARNING: application exited with abnormal code 0xc000001d$ unpacked/bin/drmemory.exe -version Dr. Memory version 1.6.1686 -- build 42
Yet running it without going through the script works fine:
$ unpacked/bin/drmemory.exe -light ../../win7-cr-builder/build/src/out/Debug/sql_unittests.exe --gtest_filter=SQLConnectionTest.Execute --single-process-tests
Dr.MDr. Memory version 1.6.1686Dr.MRunning "../../win7-cr-builder/build/src/out/Debug/sql_unittests.exe --gtest_filter=SQLConnectionTest.Execute --single-process-tests" Valgrind detected, switching to single process mode. Pass --test-launcher-debug-launcher to valgrind the launcher itself. Note: Google Test filter = SQLConnectionTest.Execute [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from SQLConnectionTest [ RUN ] SQLConnectionTest.Execute [ OK ] SQLConnectionTest.Execute (1029 ms) [----------] 1 test from SQLConnectionTest (1029 ms total)[----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (1061 ms total) [ PASSED ] 1 test.
Dr.MDr.MNO ERRORS FOUND:Dr.M0 unique, 0 total unaddressable access(es)Dr.M0 unique, 0 total invalid heap argument(s)Dr.M0 unique, 0 total GDI usage error(s)Dr.M0 unique, 0 total warning(s)Dr.MERRORS IGNORED:Dr.MDetails: E:\b\build\slave\win7-cr\build\unpacked\drmemory\logs\DrMemory-sql_unittests.exe.3180.000\results.txtI can repro with:
$ unpacked/bin/drmemory.exe -symcache_dir C:/Users//AppData/LocalLow/drmemory.symcache -batch -no_count_leaks -no_leak_scan -light -no_callstack_use_top_fp_selectively -- ../../win7-cr-builder/build/src/out/Debug/sql_unittests.exe --gtest_filter=SQLConnectionTest.Execute --single-process-tests
Dr.MDr. Memory version 1.6.1686Dr.MRunning "../../win7-cr-builder/build/src/out/Debug/sql_unittests.exe --gtest_filter=SQLConnectionTest.Execute --single-process-tests" Note: Google Test filter = SQLConnectionTest.Execute [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from SQLConnectionTest [ RUN ] SQLConnectionTest.Execute Backtrace: (No symbol) [0x01C2DF56] SHPathPrepareForWriteW [0x759C5C81+1902] Ordinal201 [0x75979461+2849] Ordinal201 [0x7597AB99+8793] IsUserAnAdmin [0x759C4DE5+2408] SHFileOperationW [0x759C97EF+231] SHFileOperationW [0x759C9717+15] base::DeleteFileW 0x007B29B5+437 ...-no_use_symcache => works fine
So it's a problem with a symcache file.
I made a copy: $ rsync -av drmemory.symcache drmemory.BUGCOPY.symcache
Binary search => it's SHELL32.dll.txt
$ cat SHELL32.dll.txt Dr. Memory symbol cache version 11 408,12873728,2139568001799028737,2139568001799028737,12906023,1339216345,12886016 0 operator delete,0x6b0da std::_DebugHeapDelete<>,0x0 Concurrency::details::Security::EncodePointer,0x0 operator delete[] nothrow,0x0 operator delete[],0x6b0e0 operator delete nothrow,0x0 operator new nothrow,0x0 operator new,0x6b296 operator new[] nothrow,0x0 operator new[],0x6b090
$ ls -l ../drmemory.BUGCOPY.symcache/drmemory.symcache/SHELL32.dll.txt SHELL32.dll.txt -rwx------ 1 408 Aug 22 18:26 ../drmemory.BUGCOPY.symcache/drmemory.symcache/SHELL32.dll.txt
-rwxr-xr-x 1 392 Jan 28 18:09 SHELL32.dll.txt
$ diff ../drmemory.BUGCOPY.symcache/drmemory.symcache/SHELL32.dll.txt SHELL32.dll.txt
2c2
< 408,12873728,2139568001799028737,2139568001799028737,12906023,1339216345,12886016
So the bad one has entries for operators while the newly generated one does not.
module load event: "SHELL32.dll" 0x75970000-0x765ba000 modid: 40 C:\Windows\SysWOW64\shell32.dll module SHELL32.dll imports from msvc* so not searching inside it module SHELL32.dll imports from msvc* so not searching inside it WARNING: unable to load symbols for C:\Windows\SysWOW64\shell32.dll sym lookup of shell32!operator new* in C:\Windows\SysWOW64\shell32.dll => 0 0x00000000 marking operator new as processed since regex didn't match marking operator new[] as processed since regex didn't match marking operator new nothrow as processed since regex didn't match marking operator new[] nothrow as processed since regex didn't match sym lookup of shell32!operator delete* in C:\Windows\SysWOW64\shell32.dll => 0 0x00000000 marking operator delete as processed since regex didn't match marking operator delete[] as processed since regex didn't match marking operator delete nothrow as processed since regex didn't match marking operator delete[] nothrow as processed since regex didn't match sym lookup of shell32!std::_DebugHeapDelete<*> in C:\Windows\SysWOW64\shell32.dll => 0 0x00000000 sym lookup of shell32!std::_DebugHeapDelete<> in C:\Windows\SysWOW64\shell32.dll => 5 0x00000000 sym lookup of shell32!Concurrency::details::Security::EncodePointer in C:\Windows\SysWOW64\shell32.dll => 0 0x00000000 post-module-load @0:00:01.903 in thread 2092
With symbols (on another machine, w/ that shell32.dll):
0:000> lm start end module name 73800000 7444a000 shell32 (pdb symbols) c:\src\symbols\shell32.pdb\1489BCD8E80C441E91308A093631DE5E2\shell32.pdb 0:000> x shell32!operator new 738ac806 shell32!operator new[](no parameter info) 738aa97d shell32!operator new ()
0:000> x shell32!operator delete
738ac788 shell32!operator delete[](no parameter info)
738ac374 shell32!operator delete ()
738adf73 shell32!operator delete ()
So those values are incorrect.
Did we ever have a pdb on that bot? Why would it have found those operator symbols at all?
Original issue: http://code.google.com/p/drmemory/issues/detail?id=1427