Dead-lock in PatchAllModules on windows

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Dynamically linked, single-threaded QT-Application (QT lib on windows
being multi threaded).
2. Application starts up and waits for user input. On user input it
freezes. The behaviour is not 100% reproducible, in particular, when
debugging it may go away.

What is the expected output? What do you see instead?

No freeze.

What version of the product are you using? On what operating system?

1.2, win xp

Please provide any additional information below.

In the end, the debugger shows that
 * one thread is waiting for the patch_all_modules_lock and is locked
 * the thread holding the lock is locked in
GetProcAddress-->kernel32.dll-->ntdll.dll of Patch in patch_functions.cc

Before, when the second thread had acquired the lock, the first thread
seemed to be somewhere below some OLE related QT function.

Commenting out the lock, everything seems to works, but this is just for
your information, not a suggestion. :-)

Original issue reported on code.google.com by weidenri...@gmx.de on 26 May 2009 at 1:10

GoogleCodeExporter commented 9 years ago

Ah, interesting.

patch_all_modules_lock is held by -- no surprise -- PatchAllModules, which is 
called
at program startup and also called during LoadLibraryEx (we patch LoadLibraryEx 
to
call PatchAllModules in order to patch the new module).  If LoadLibraryEx 
somehow
recursively called another LoadLibraryEx, or in fact just a call to 
PatchAllModules
somehow resulted in a call to LoadLibraryEx, that would explain the deadlock 
you're
seeing.

But that would be 100% reproducible, and your problem isn't.  In fact, it looks 
like
your problem is two threads are calling LoadLibraryEx at the same time.  That
shouldn't cause deadlock though.

Can you report the stack-traces for the two threads during deadlock?  You say 
the
thread holding the lock is locked in GetProcAddress -- what is there before 
that? 
And there's nothing after the call into ntdll.dll?

The fact it freezes on user input is intersting.  Maybe windows needs to load a 
new
DLL then?

I'm suspicious that the thread holding the lock is frozen in a call to
GetProcAddress.  I wonder if that is requiring you to load a new library, which
results in deadlock.  I'd expect you to see that in the stacktrace, though.

I can rework this code to lessen the likelihood of deadlock, but I can't 
promise to
eliminate it until I understand what's going on.

Original comment by csilv...@gmail.com on 26 May 2009 at 5:28

Added labels: Priority-Medium, Type-Defect
Removed labels: ****

GoogleCodeExporter commented 9 years ago

I attach stack traces of the two threads, four at the point of the deadlock and 
two
at the point of the locking.

It appears to me that
 1. Thread 0 does some OLE registration, this causes the loading of a library in a
different thread "4".
 2. Thread 4 starts loading the library and patching.
 3. Thread 0 continues and runs into a LoadLibraryW. Conjecture: This locks some
internal ntdll.dll ressource.
 4. Thread 4 calls GetProcAddress. Conjecture: This waits for the same ressource in
ntdll.dll
 5. Thread 0 has loaded the library and wants to start patching. --> Deadlock

Possibly, the threads request two ressources (tcmalloc's and ntdll's) in 
different order.

What is implausible in this explanation: When Thread 0 reaches patching, it 
still
holds the ntdll resource, while when Thread 4 reaches patching it does not, or 
at
least does not prevent Thread 0 from taking it.

Ideas for solutions (I don't know how to implement them or if they are valid):
 1. request and keep ntdll's resource before/instaed tcmalloc's
 2. make patching thread safe such that no locking is necessary
 3. make the GetProcAdress-Part thread save and release tcmalloc's lock before 

Is this helpful?

Do you have any idea, how to fix it from tcmalloc's side? 

Or how to fix it from my side? Trying to pre-loading all DLLs that could be 
possibly
loaded?  Making a lock in qt that serialises the OLE and Loadlib actions?

Original comment by weidenri...@gmx.de on 27 May 2009 at 3:10

Added labels: ****
Removed labels: ****

Attachments:

stacktrace.txt

GoogleCodeExporter commented 9 years ago

Just noticed that the last sentence "Making a lock in qt that serialises the 
OLE and
Loadlib actions?" is nonsense: The OLE action initiate the other threads and 
exists.
Locking cannot help here.

Original comment by weidenri...@gmx.de on 27 May 2009 at 3:23

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Whew!  I finally got a free moment to look at this.

This is tricky.  The basic problem, it seems to me, is that thread 0 is calling
LoadLibrary recursively:
---
    libtcmalloc_minimal.dll!`anonymous namespace'::PatchAllModules()  Zeile 561 + 0xd   C++
    libtcmalloc_minimal.dll!`anonymous
namespace'::WindowsInfo::Perftools_LoadLibraryExW(const unsigned short *
lpFileName=0x7ffdfc00, void * hFile=0x00000000, unsigned long dwFlags=0)  Zeile 
875 C++
    kernel32.dll!LoadLibraryExA()  + 0x1f   
    kernel32.dll!LoadLibraryA()  + 0x2d 
    usp10.dll!ScriptApplyDigitSubstitution()  + 0x959d  
    usp10.dll!ScriptApplyDigitSubstitution()  + 0x9772  
    usp10.dll!UspFreeMem()  + 0x1ca2f   
    ntdll.dll!LdrInitializeThunk()  + 0x24  
    ntdll.dll!LdrFindResourceDirectory_U()  + 0x28d 
    ntdll.dll!RtlValidateUnicodeString()  + 0x507   
    ntdll.dll!LdrLoadDll()  + 0x110 
    kernel32.dll!LoadLibraryExW()  + 0xc8   
    libtcmalloc_minimal.dll!`anonymous
namespace'::WindowsInfo::Perftools_LoadLibraryExW(const unsigned short *
lpFileName=0x00d7e920, void * hFile=0x00000000, unsigned long dwFlags=0)  Zeile 
873
+ 0x12  C++
    kernel32.dll!LoadLibraryW()  + 0x11 
    qt-mt338.dll!QLibraryPrivate::loadLibrary()  Zeile 66 + 0x1a    C++
---

This seems to be because usp10 is (like us) putting itself in the middle of
LoadLibraryExW.  In this case, it wants to load another library in the middle 
of the
existing LoadLibraryExW call.  If LoadLibraryExW requires the os loader lock, 
which a
cursory web search shows it does (cf
http://blogs.msdn.com/cbrumme/archive/2003/08/20/51504.aspx), than this 
recursive
call would deadlock.

However, the recursive call itself isn't deadlocking.  So maybe nt has special 
code
to handle this, or the os loader lock is a recursive lock?  In any case,
GetProcAddress also requires the loader lock, and that's where the lock 
inversion
comes in.

I'm not totally sure I understand it, which is too bad because I don't know the 
best
way to fix it.  I guess the best thing to do is to try to pull the 
GetProcAddress
call up, and do it outside the patch_all_functions lock.  This is annoying, but
should be doable.  I don't know if it will solve the problem or not, though. :-(
And I hate to contort the code this way unless I really have to.

As a temporary workaround, getting rid of patch_all_modules_lock is probably 
fine. 
You're not loading any modules containing libc, probably, so all of this
LoadLibraryExW stuff is a noop in practice anyway.

Original comment by csilv...@gmail.com on 29 May 2009 at 7:51

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Ok, I've reorganized patch_functions.cc in a way that could possibly fix this
program.  Can you try applying the patch I've attached here, and see how it 
works? 
(It should apply cleanly, I hope.)  I've barely tested it, but it seems to at 
least
not blow up.

Original comment by csilv...@gmail.com on 29 May 2009 at 10:02

Changed state: Started
Added labels: ****
Removed labels: ****

Attachments:

patch.patch

GoogleCodeExporter commented 9 years ago

The patch was applied with some "fuzz", but solved the problem. Do you have 
plans for
a next release of tcmalloc?

A note about your interpretation: Isn't LoadLibraryW (the unicode version) just
calling usp10 to do the unicode->ascii conversion. And this calls some 
LoadLibraryA
again (possibly for internal purposes). So, an explanation for the 
implausibility
mentioned above ("When Thread 0 reaches patching, it still
holds the ntdll resource, while when Thread 4 reaches patching it does not.") 
could
be: Thread 0 reaches patching from LoadLibraryA, but the LoadLibraryW is 
holding the
lock. 

Anyway, thanks for the fix!

Original comment by weidenri...@gmx.de on 2 Jun 2009 at 8:42

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Great, I'm glad the patch works!  I'm looking to get a new release out this 
month
sometime, but I don't know more details than that.  This change will be in it.

Original comment by csilv...@gmail.com on 2 Jun 2009 at 5:42

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

This should be fixed in perftools 1.3, just released.

Original comment by csilv...@gmail.com on 10 Jun 2009 at 2:06

Changed state: Fixed
Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Unfortunately, I got a deadlock in the same situation again. :-( 
It's with v1.4, but I cannot say that it has to do with the version. Currently, 
I
cannot reproduce it in a way that I get a stack trace, but I keep trying.

Original comment by weidenri...@gmx.de on 6 Oct 2009 at 12:29

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Drat.  I'll reopen the bug.  If you can get a stacktrace, let me know, and I'll 
see
if I can figure out what's going on.

Original comment by csilv...@gmail.com on 13 Oct 2009 at 9:25

Changed state: Accepted
Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Finally I got a stack trace, it was the same situation, but probably a different
problem. So, please close this issue, I will open a new one.

Original comment by weidenri...@gmx.de on 11 Nov 2009 at 6:21

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Closing this issue, as requested. :-)  Only 4 months late...

Original comment by csilv...@gmail.com on 10 Mar 2010 at 6:07

Changed state: Fixed
Added labels: ****
Removed labels: ****

blueszhangsh / gperftools

Dead-lock in PatchAllModules on windows #134