Balzanka / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

FinalizableReferenceQueue keeps ClassLoaders around. #92

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
If the class loader which loaded the google-collections jar is let go
(consider redeploying a war to Apache Geronimo) the class loader will not
be able to be garbage collected.  This is caused by the thread started by
FinalizableReferenceQueue never ending.  In other words the garbage
collection (for the class loader) can only happen when the thread stops. 
The solution is to stop the thread when there are no more objects on the
queue.  This will happen eventually if the rest of the app behaves
correctly by not placing state outside of its class loader's environment. 
Attached is a version of the class which I have used to this effect.

This is also a problem in the OSGi space where every jar gets its own class
loader and they can be refreshed easily at runtime to do rolling
upgrades/bug fixes.

Original issue reported on code.google.com by nairb...@gmail.com on 28 Jul 2008 at 5:39

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks very much!  Bob has agreed to fix this issue.  He's decided not to use 
your
patch, and will fix it a slightly different way, but thanks for going to the 
extra
effort anyway.

Original comment by kevin...@gmail.com on 28 Jul 2008 at 7:00

GoogleCodeExporter commented 9 years ago
A fix is a fix is a fix... :)  There is always more then one way - I just took 
the
fastest and least complicated way I could think of.

Thanks.

Original comment by nairb...@gmail.com on 28 Jul 2008 at 7:43

GoogleCodeExporter commented 9 years ago
FYI: I just tested the new version of FinalizableReferenceQueue with Guice on 
an OSGi
container, and I can confirm it now allows the classloader to be garbage 
collected :)

Original comment by mccu...@gmail.com on 26 Nov 2008 at 5:24

GoogleCodeExporter commented 9 years ago
This isn't solved for me (under Glassfish) but I'm not 100% sure I am 
interpreting
the results correctly. See
http://groups.google.com/group/google-guice/msg/ff53f95f422edabf

Original comment by gili.tza...@gmail.com on 27 Nov 2008 at 2:31

GoogleCodeExporter commented 9 years ago
I'd need to know more about the way you're loading Guice into the classloader
hierarchy to comment, and also if you're seeing any warnings or exceptions 
relating
to the Finalizer thread. What I can say is that it does unload OK under OSGi, 
which
suggests that something else is keeping the classloader alive in your case.

Note that the Finalizer thread won't shutdown until the Guice classloader is 
eligible
for GC, so it will appear as a GC root until that happens - you may think it 
looks
like a leak, but actually it will clear itself once other references to the 
Guice
classloader are cleared. So I'd dig deeper into the heapdump for other causes.

You may also need to force finalization and GC a few times using the System API.

Original comment by mccu...@gmail.com on 27 Nov 2008 at 3:01

GoogleCodeExporter commented 9 years ago
I did some digging and found the problem - when running under JEE it seems 
there are
several additional references from the Finalizer thread to the container 
classloader
via access control contexts (presumably because of additional protection 
domains).

Here's the complete list of references I found and some possible solutions:

1. Finalizer -> Thread.inheritedAccessControlContext -> ... -> Web CL

   Set in Thread constructor to 'AccessController.getContext()' ie. the
   context of the calling thread, this may refer to the Web classloader.

   Solution: wrap creation of Finalizer thread in doPrivileged block

2. URLClassLoader -> URLClassLoader.acc -> ... -> Web CL

   As above URLClassLoader constructor sets 'acc' field to current context.

   Solution: doPrivileged block doesn't help here, the only way I found to
   avoid this reference was to use reflection to null the 'acc' field which
   is yucky, anyone know a better option? Perhaps we could grab defineClass
   from the system classloader (again with reflection) and use that to load
   the Finalizer into the system classloader, but this is also yucky :(

3. Finalizer -> Thread.inheritableThreadLocals -> ... -> Web CL

   Each thread inherits an inheritableThreadLocals map from its parent, and
   this could include a reference back to the container classloader (depends
   on the app) - again no easy fix for this one, had to resort to reflection
   to null out the field :(

4. Finalizer -> Thread.contextClassLoader -> Web CL

   Yet another setting inherited from the calling thread, but at least here
   there's a public API 'setContextClassLoader()' we can use to clear it :)

With these references removed I can completely unload Guice from GlassFish v2.

I've attached a patch in case people are interested - but because of the hacky
use of reflection to clear certain private fields, I'm not too happy with it.

I'm wondering instead whether it would be better to provide a public API people
could use to shutdown the thread, which would then make a lot of this code moot.

Anyway, Happy Thanksgiving!

Original comment by mccu...@gmail.com on 28 Nov 2008 at 3:14

Attachments:

GoogleCodeExporter commented 9 years ago
Nice detective work, mcculls!

Original comment by crazybob...@gmail.com on 30 Nov 2008 at 2:38

GoogleCodeExporter commented 9 years ago
FYI here are the safe fixes that I think should be used in google-collections, 
such
as the use of doPrivileged() when creating the classloader and the Finalizer 
thread.
The TCCL should also be cleared, just in case.

The two other references (possible inherited thread-local and access control 
context)
are best fixed in the web container - certainly the thread-local issue is 
fixable in
GlassFish. However, still not sure about the remaining access control context 
issue.

Original comment by mccu...@gmail.com on 2 Dec 2008 at 9:55

Attachments:

GoogleCodeExporter commented 9 years ago
We'll address all of this in time for 1.0. I'm in favor of the reflection hacks 
over
an explicit shutdown hook.

Original comment by crazybob...@gmail.com on 17 Mar 2009 at 9:48

GoogleCodeExporter commented 9 years ago

Original comment by kevin...@gmail.com on 18 Mar 2009 at 2:21

GoogleCodeExporter commented 9 years ago

Original comment by kevin...@gmail.com on 17 Sep 2009 at 5:45

GoogleCodeExporter commented 9 years ago

Original comment by kevin...@gmail.com on 17 Sep 2009 at 5:46

GoogleCodeExporter commented 9 years ago
Seems this won't be in 1.0 after all.

Original comment by kevin...@gmail.com on 16 Oct 2009 at 9:01

GoogleCodeExporter commented 9 years ago
Could you please reconsider this? Resource leak like this leads to frequent
PermGenErrors and thus to app server restarts, causing developers to need to 
wait for
several minutes on each restart. Additionally, there are other redeployment 
issues at
least under Windows because of file-locking (jar files can't be deleted or 
moved).
Unfortunately there aren't any (usable) workarounds, so it would be really 
great if
this could be included in 1.0.

Original comment by syva...@yahoo.com on 26 Oct 2009 at 2:06

GoogleCodeExporter commented 9 years ago
As for workarounds: I've experimented with a ServletContextListener that finds 
any
Finalizer Threads and clears all the fields that Stuart mentioned.
http://pastie.org/686765 . I've tested for Guice Finalizer under Jetty and 
Tomcat and
it works.

For Servlet environments afaik ServletContextListener is the only place you're 
not
discouraged or explicitly forbidden to spawn threads. Therefore I am using a 
custom
Finalizer that creates just one Finalizer in SCL and makes sure it is disposed 
at the
end.

Original comment by alen_vre...@yahoo.com on 6 Nov 2009 at 6:13

GoogleCodeExporter commented 9 years ago
Will this cause a resource leak merely by including the jar on the classpath?  
Or 
will it only cause a problem if FinalizableReferenceQueue or classes using it 
(e.g. 
MapMaker) is used/referenced?  If the latter is the case, one could presumably 
work 
around the issue by making sure that the class isn't referenced.

Some sort of fix or workaround seems like a fairly critical thing to have, 
since I 
imagine many (most?) people who would want to use the google collections 
library are 
running under JEE.  The PermGen issue won't necessarily occur immediately as 
they 
may not redeploy right away, so users may not even know why PermGen errors 
suddenly 
started happening when they redeploy their war files.

Original comment by pseudony...@gmail.com on 12 Nov 2009 at 12:26

GoogleCodeExporter commented 9 years ago
Any news on this issue?

Original comment by terciofi...@gmail.com on 11 Apr 2010 at 9:49

GoogleCodeExporter commented 9 years ago
No. Bob was going to do it but... then he didn't.  Bob?

Original comment by kevinb@google.com on 12 Apr 2010 at 11:43

GoogleCodeExporter commented 9 years ago
Today I received a PermGen error from my beloved Tomcat, for wich the 
JreMemoryLeakPreventionListener logged the error message "SEVERE: A web 
application 
appears to have started a thread named [com.google.inject.internal.Finalizer] 
but has 
failed to stop it. This is very likely to create a memory leak.".

After some research, I figured this has to do with the issue addressed here, so 
I'm 
wondering, is it about to be fixed, or is there some API to close the thread 
started by 
Guice?

Original comment by pawjans...@gmail.com on 29 May 2010 at 9:48

GoogleCodeExporter commented 9 years ago

Original comment by kevinb@google.com on 30 Jul 2010 at 3:53

GoogleCodeExporter commented 9 years ago

Original comment by kevinb@google.com on 30 Jul 2010 at 3:56

GoogleCodeExporter commented 9 years ago
Issue 382 has been merged into this issue.

Original comment by kevinb@google.com on 8 Sep 2010 at 7:00

GoogleCodeExporter commented 9 years ago
No news about this issue? Does anybody have a workaround?

Original comment by terciofi...@gmail.com on 9 Sep 2010 at 8:14

GoogleCodeExporter commented 9 years ago
I talked to Bob about it today.  We're considering reading a system property -- 
shocking, I know -- that would tell MapMaker not to start any background thread 
(but perform cleanup in the user thread).  Using that would make this problem 
go away for you.

Original comment by kevinb@google.com on 10 Sep 2010 at 6:36

GoogleCodeExporter commented 9 years ago
I might be missing something, but couldn't you just add a shutdown method?

The problem is most severe in app servers and often you can't add system 
properties there, so the workaround wouldn't be very widely applicable.

You could just document the shutdown method as unsupported and not part of 
public API. Perhaps even advice that it's existence should be checked by 
reflection (and thus also called by reflection).

Original comment by syva...@yahoo.com on 10 Sep 2010 at 6:57

GoogleCodeExporter commented 9 years ago
The suggested solution of using a System property is unfortunately useless for 
a vast number of deployment scenarios and does nothing to fix the problem. +1 
for just not creating a Finalizer thread in a library and instead piggybacking 
expiry on user-level operations every <hopefully customizable n> operations 
instead.

Here's another idea that builds on this. How about requiring a user-supplied 
Executor in which to run instead, as additional argument to the expiration() 
builder method or the public FinalizerQueue class? This decouples enabled 
expiry from having to create a physical thread at all, and allows proper 
lifecycle control by the caller, instead of forcing the library to fly blind 
and make guesses about what to do. The caller then is not only responsible, but 
also now actually in a position (!) to do environment-specific lifecycle 
control: quiesced waiting, forced shutdown, Bundle.stop(), 
ServletContextListener, whatever.

While we're at it, the same Executor could also be used to invoke the listener, 
if added by the builder.

One new caveat would be that a user can now share an Executor between multiple 
CCHMs. If the Executor had really only one or any other fixed number of 
threads, then too many continuously running-and-blocking-without-timeout 
Finalizer.run() in their current form might start clogging the Executor. If 
reference cleaning were batched per CCHM, the run() loop could terminate after 
doing whatever is necessary.

As far as I can tell this solves the Finalizer thread lifecycle problem (caused 
really by the nondeterministic behaviour of its ref queue polling and therefore 
non-termination) and also the thread-per-CCHM overhead (n CCHMs use n Finalizer 
threads for no really good reason).

Original comment by holger.h...@googlemail.com on 10 Sep 2010 at 9:08

GoogleCodeExporter commented 9 years ago
+1 for a shutdown method, this is much better as in my deployment scenario I 
already have a shutdown for my stuffs and I think that the ContextDestroyed 
could do this job.

Original comment by terciofi...@gmail.com on 10 Sep 2010 at 1:00

GoogleCodeExporter commented 9 years ago
I'm sorry, if it's true that our intended solution is "useless" and "does 
nothing to fix the problem" would it be too much trouble to explain why that is?

Original comment by kevinb@google.com on 10 Sep 2010 at 1:48

GoogleCodeExporter commented 9 years ago
Well, I don't think that this solution is useless nor does nothing to fix the 
problem, I just think that isn't the best way to solve it, just that.

Maybe people upon can explain their reasons about that... 

Original comment by terciofi...@gmail.com on 10 Sep 2010 at 2:13

GoogleCodeExporter commented 9 years ago
Piggybacking on access is not useless (which is what I wrote), System 
properties are - for the reasons that other people have written as well. You 
don't know whether the System properties are accessible at all (security), up 
to date, or have just changed underneath you on the fly. Different applications 
or even different client classes in the same VM can have different opinions 
which setting is appropriate for them.

Original comment by holger.h...@googlemail.com on 10 Sep 2010 at 2:19

GoogleCodeExporter commented 9 years ago
Comment #30 made a good point. A system property provides less flexibility if 
two or more (web) apps are running on the same VM (server). One app might rely 
on the background processing wheres another is ok with running in the user 
thread.

Original comment by eclipseguru@gmail.com on 10 Sep 2010 at 5:07

GoogleCodeExporter commented 9 years ago
Issue 517 has been merged into this issue.

Original comment by kevinb@google.com on 12 Jan 2011 at 8:36

GoogleCodeExporter commented 9 years ago
Hi everyone, I know it's been a long time getting this resolved, and I'm 
committed to dealing with this over the next few months.

One question though: looking at FinalizableReferenceQueue leads me to believe 
that this whole problem can be avoided by loading Guava in the system class 
loader. Is this an option for you?

Original comment by fry@google.com on 13 Jan 2011 at 7:39

GoogleCodeExporter commented 9 years ago

Original comment by yrfselrahc@gmail.com on 13 Jan 2011 at 8:11

GoogleCodeExporter commented 9 years ago
Not really, each webapp should run in its own ClassLoader. You might end up 
with two different webapps depending on different versions of Guice.

I find the history of this bug quite silly. We could have added a shutdown 
method back in 2008 and been done with it. If you magically came up with a 
cleaner solution at a later time you could have deprecated and removed this 
method. Seeing as this method will be invoked at most once in every 
application, removing it (when the time comes) should cause minimal disturbance.

Original comment by cowwoc...@gmail.com on 13 Jan 2011 at 8:23

GoogleCodeExporter commented 9 years ago
Agreed that Guava should be loaded at the webapp level and not the system class 
loader.

I'm guessing that the scale that Google operates, you may have one process or 
web application per system.  Smaller deployments will may have many web 
applications loaded in a single application server, e.g. Tomcat, so here you 
would want each one to have its own copy of Guava.  Or even Google Collections, 
if they are very old and haven't been worked on to be updated to use Guava.  
You wouldn't want to be constrained in one web application which Guava version 
to use in another web application.

Original comment by blair-ol...@orcaware.com on 16 Jan 2011 at 7:28

GoogleCodeExporter commented 9 years ago
FWIW I've put together a potential patch for the related Guice Finalizer issue:

  http://code.google.com/p/google-guice/issues/detail?id=288#c30

It uses an Executor approach instead of a shutdown method because that provides 
the container with much more flexibility in how it schedules the background 
work. It also avoids the potential pitfalls of a shutdown method - namely who's 
responsible for calling it, what happens if more work comes in after the 
shutdown begins, what happens if the shutdown call blocks, etc. (i.e. same 
reasons why Thread.stop() is deprecated).

The patch currently uses a system property to configure the executor - while 
this is not ideal, it does allow people to experiment with the feature without 
(yet) committing to an API (as this is an implementation detail for Guice it's 
harder to come up with something as neat as a builder method on MapMaker...).

Experimenting with the various options I think the Executor approach is 
definitely the way to go - how this is configured is another question. Using 
something like a system property might be a reasonable compromise if people are 
loathe to commit to a particular API at the moment.

Original comment by mccu...@gmail.com on 17 Jan 2011 at 2:04

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
We don't use Guice, but Spring for dependency injection, throughout our 
webapps, so as a little background for us, how would the patch you're working 
on impact Guava?  I saw a mention that Guice includes a copy of portions of 
Guava?

Since we use Spring, I wouldn't mind a public shutdown method that I can use to 
properly clean everything up in Guava.

Original comment by blair-ol...@orcaware.com on 19 Jan 2011 at 12:53

GoogleCodeExporter commented 9 years ago
As Holger mentioned back in comment #26, Guava could add selection of the 
Executor to the MapMaker API which would be better than using System.properties 
- the basic implementation behind the scenes would be similar to the proposed 
Guice patch and would work regardless of whether you're using Guice, Spring, or 
new.

Executors are better than a single static shutdown method for the same reason 
that Thread.stop is deprecated.

Original comment by mccu...@gmail.com on 19 Jan 2011 at 9:38

GoogleCodeExporter commented 9 years ago

Original comment by kevinb@google.com on 27 Jan 2011 at 1:33

GoogleCodeExporter commented 9 years ago

Original comment by yrfselrahc@gmail.com on 27 Jan 2011 at 3:11

GoogleCodeExporter commented 9 years ago

Original comment by fry@google.com on 22 Mar 2011 at 6:27

GoogleCodeExporter commented 9 years ago

Original comment by fry@google.com on 23 Mar 2011 at 1:49

GoogleCodeExporter commented 9 years ago
@fry does this mean there isn't a solution in sight?

Original comment by tj.rothw...@gmail.com on 23 Mar 2011 at 2:53

GoogleCodeExporter commented 9 years ago
It means that I'm tired of changing the milestone. :-)

In practice I have a working solution, but it breaks many internal tests so it 
will take a while to work through the carnage. I have high hopes for release 10.

Original comment by fry@google.com on 23 Mar 2011 at 10:38

GoogleCodeExporter commented 9 years ago
May I ask that you please share the approach (or the code) with us so that we 
can complain in advance, not just after it has been committed? :-)

Original comment by holger.h...@googlemail.com on 23 Mar 2011 at 10:54

GoogleCodeExporter commented 9 years ago
Sure, the idea is to have a ReferenceQueue inside of each MapMaker (instead of 
globally), and to drain it on write operations (or on occasional reads in the 
absence of writes).

The internal testing issue is that we have tests which assume that simply 
allowing time to pass will result in cleanup, which will no longer be the case.

Does this approach also sound problematic in your use cases?

Original comment by fry@google.com on 23 Mar 2011 at 10:58

GoogleCodeExporter commented 9 years ago
Sounds good to me :)

Original comment by cowwoc...@gmail.com on 23 Mar 2011 at 12:09

GoogleCodeExporter commented 9 years ago
Hitching a ride on other operations (just like in lockfree algorithms) should 
work as long as the Map is occasionally used, and getting rid of the threads & 
shared queues should fix all the stale/cross classloader issues like web app 
undepoy leaking. Very nice! Like it.

Original comment by holger.h...@googlemail.com on 23 Mar 2011 at 12:22