Unloading/loading can cause crashes and deadlocks

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Dynamically linked, single-threaded QT-Application (QT lib on windows
being multi threaded).
2. Application starts up, in an Windows Get Filename Dialog, the
application exists. Sometimes, all windows close, but the application is
still listed in the task mananger (deadlock).

What is the expected output? What do you see instead?

No crash.

What version of the product are you using? On what operating system?

1.4, win xp

Please provide any additional information below.

I think, I found the bug: 
If a module_libcs is unloaded, the is_valid of the is set to false, but
origstub_fn_ stays dirty. The next load with necessaty patching calls
"PreamblePatcher::Patch(...  &origstub_fn_[i]));" with dirty origstub_fn.
Here, the patcher returns an parameter error.

Now, exit(1) is called. This leads to some freeing of libraries, which
calls PatchAllModules and this tries to get the patch_all_modules_lock
again, and than there is a deadlock.

I attach a patch that solves this.
A slightly better patch would have been to clear the origstub_fn_[x] when
setting is_valid to false, but with the special construction of the
module_libcs, this is not so easy.

Minor Sidenodes:
 * Wouldn't it be better to fail when too many modules containing libc are
included (end of PatchOneModuleLocked)?
 * I find it advantageous to have ASSERTS/CHECKS dumping file/line and have
them call a special function where I can put a break point.

Original issue reported on code.google.com by weidenri...@gmx.de on 10 Dec 2009 at 11:11

Attachments:

patch.txt

GoogleCodeExporter commented 9 years ago

I actually like the approach you took in your patch better than the alternative 
you 
talked about.  Patch() requires that its third value be a pointer to NULL, so 
we 
should be doing that before the call to Patch in any case.

} Wouldn't it be better to fail when too many modules containing libc are
} included (end of PatchOneModuleLocked)?

Well, our thought is that it would be better to use the standard windows malloc 
in 
that case, rather than kill the entire application.  But it's true, without 
knowing a 
specific use-case that triggers this error condition, it's hard to know the 
right way 
to treat it.

} I find it advantageous to have ASSERTS/CHECKS dumping file/line and have
} them call a special function where I can put a break point.

In unix-world, breaking on exit() serves this purpose.  Does it not work for 
windows?

Original comment by csilv...@gmail.com on 11 Dec 2009 at 6:42

Changed state: Started
Added labels: Priority-Medium, Type-Defect

GoogleCodeExporter commented 9 years ago

> I actually like the approach you took in your patch better than the 
alternative you 
> talked about.  Patch() requires that its third value be a pointer to NULL, so 
we 
> should be doing that before the call to Patch in any case.

I just thought that the other variant would check more internal consistency.

> > } Wouldn't it be better to fail when too many modules containing libc are
> > } included (end of PatchOneModuleLocked)?
> Well, our thought is that it would be better to use the standard windows 
malloc in 
>that case, rather than kill the entire application.  But it's true, without 
knowing > a 
> specific use-case that triggers this error condition, it's hard to know the 
right >
way 
> to treat it.

I think with this idea, printf makes perfectly sense. Perhaps adding a comment 
would
be good.

> } I find it advantageous to have ASSERTS/CHECKS dumping file/line and have
> } them call a special function where I can put a break point.
> In unix-world, breaking on exit() serves this purpose.  Does it not work for 
windows?

You are right.

Two other related things:
1)

There is a memory leak, because orgistub_fn is newed, but not deleted, right?

In the unpatching code I found:
  // Stub is now useless so delete it.
  // [csilvers: Commented out for perftools because it causes big problems
  //  when we're unpatching malloc.  We just let this live on as a  leak.]
  //delete original_function_stub;

But for the unloaded library, wouldn't it be no problem?
It probably makes no big difference, but as in that Windows Get Filename Dialog 
some
librarys (acrobat) are loaded and unloaded many times, it may be worth thinking 
of an
implementation for an delete + set to 0 at unloading.

2) In the patching I found the comment:

  // TODO(V7:joi) Siggi and I just had a discussion and decided that both
  // patching and unpatching are actually unsafe.  We also discussed a
  // method of making it safe, which is to freeze all other threads in the
  // process, check their thread context to see if their eip is currently
  // inside the block of instructions we need to copy to the stub, and if so
  // wait a bit and try again, then unfreeze all threads once we've patched.
  // Not implementing this for now since we're only using SideStep for unit
  // testing, but if we ever use it for production code this is what we
  // should do.

I don't know, if this comment is still valid. Do you have any idea on that? 
Perhaps I
should take the effort and use the same approach as Chorme: static linking and
patched libc...

Original comment by weidenri...@gmx.de on 11 Dec 2009 at 12:07

GoogleCodeExporter commented 9 years ago

} But for the unloaded library, wouldn't it be no problem?

You mean, the one that's about to be written over?  Good point.  I can delete 
before
setting to NULL.  I'll give that a go.

}   // TODO(V7:joi) Siggi and I just had a discussion and decided that both
}   // patching and unpatching are actually unsafe.  We also discussed a

I think you're right, and this comment is obsolete.

That said, static linking and patched libc will probably end up being more 
reliable
than this runtime patching.  At least, that was chromium's experience.  I'm glad
you're working to make the runtime patching more robust, though!  It's 
definitely
easier to use.

Original comment by csilv...@google.com on 11 Dec 2009 at 8:28

GoogleCodeExporter commented 9 years ago

I tried,
      if ( origstub_fn_[i] )
      {
             delete reinterpret_cast<void*>(origstub_fn_[i]);
         origstub_fn_[i]=0;
      }
and it seemed to work.
But my preference would be: This delete should be done at the point, where the
module-unloading is detected, or not done at all. (decoupling of side-effects.)

Original comment by weidenri...@gmx.de on 17 Dec 2009 at 3:03

GoogleCodeExporter commented 9 years ago

The right place to do this is in Unpatch().  We never actually unpatch anywhere
though (that code is commented out as too dangerous), so it's just as good to 
do it
here as anywhere.  Though maybe nowhere is best.

That said, I realized a bug in the code.  Can you try changing the delete to
   delete[] reinterpret_cast<unsigned char*>(origstub_fn_[i]);
and make sure the code still works properly for you?

Original comment by csilv...@gmail.com on 18 Dec 2009 at 6:39

GoogleCodeExporter commented 9 years ago

Ah, I just noticed, that this is unanswered: I tried "delete[]" and the code did
still work properly. But given the fact that the wrong "delete" did not produce
failures, this is not a very relevant information.

Original comment by weidenri...@gmx.de on 19 Jan 2010 at 12:07

GoogleCodeExporter commented 9 years ago

Actually, it's great to know that fixing it correctly is still safe. :-)  I'll 
make 
that in the next release, which I'm (somewhat optimistically) hoping will be 
tomorrow.

Original comment by csilv...@gmail.com on 19 Jan 2010 at 7:15

GoogleCodeExporter commented 9 years ago

This should be fixed in perftools 1.5, just released.

Original comment by csilv...@gmail.com on 20 Jan 2010 at 11:07

Changed state: Fixed

caohaiwd / gperftools

Unloading/loading can cause crashes and deadlocks #198