eclipse-m2e / m2e-core

Eclipse Public License 2.0
113 stars 115 forks source link

Build plugin execution leaking classes into the Eclipse JVM #1214

Closed odrotbohm closed 1 year ago

odrotbohm commented 1 year ago

m2e executes build plugins both during project updates and also during incremental compilation of sources of a project. A sample project available here shows how the execution of e.g. the Bytebuddy Maven Plugin causes ~800 classes being added to the Eclipse JVM for each full project update and a low to mid two-digit value for each compilation.

I spoke to @raphw (author of the Bytebuddy Maven Plugin) and he couldn't think of any obvious problem within the plugin itself. I was wondering if this symptom could be related to how m2e handles the instances of the plugins. It seemed to reuse the instances between incremental compilation, but I am totally just observing this from the outside.

I'd be willing to help to diagnose this further if you have any pointers which code is actually creating and maintaining the build plugin instances, which object instances to look out for in a heap dump etc.

laeubi commented 1 year ago

@odrotbohm have you checked (e.g. with JProvfler or VisualVM) that the classes are retained (e.g. after a GC?) if yes, generally the next step is to look at the GC_ root that is preventing the classes from being garbage collected.

Beside this it could help if the maven plugin participates in the Incremental build e.g. with BuildContext: https://www.eclipse.org/m2e/documentation/m2e-making-maven-plugins-compat.html#buildcontext

odrotbohm commented 1 year ago

I'm exploring the application using YourKit but have only seen the forest for the trees so far. Yes, the classes are retained after GCs. So, the number of loaded classes grows unbounded and ultimately results in significant off-heap memory consumption.

Yes, the plugin has already supported incremental builds (I actually contributed a draft of that) for a while. The individual compilations don't have a drastic effect on the number of loaded classes. However, a full project update sometimes adds up to a 1000 of them. 😕

laeubi commented 1 year ago

I think then you should look at the classloader of the retained class and check what instance holds this so it can't be garbage collected.

Yourkit seem to have also an option to show GC roots: https://www.yourkit.com/docs/dotnet-profiler/2022.9/help/gc_roots.jsp.

odrotbohm commented 1 year ago

The classloader of those classes is a ClassRealm and the GC Roots path for one of them looks as follows:

Screenshot 2023-01-21 at 16 57 13

Unfortunately, I am not really able to read anything out of this. The path passes the PlexusContainerManager but that doesn't seem too unusual, actually.

laeubi commented 1 year ago

I don't know what bytebuddy plugin does, but loads it classes through the classloader of the plugin? If yes, it probably better would use an own URLClassloader that is disposed after the work is done. Neverless I would have assumed that m2e detatches the classrealm of the plugin, but need to further investigate on this.

Both have pro and con of course, e.g. when classes are garbage collected they might need to be loaded on each call again...

raphw commented 1 year ago

That is pretty much what it does. It's a rather small mojo and all class loading is happening here: https://github.com/raphw/byte-buddy/blob/master/byte-buddy-maven-plugin/src/main/java/net/bytebuddy/build/maven/ClassLoaderResolver.java#L135

The class loaders are later closed. I would not understand how this code is leaking classes.

laeubi commented 1 year ago

return new URLClassLoader(urls.toArray(new URL[0]), ByteBuddy.class.getClassLoader()); is using the ByteBuddy classes classloader as the parent and thus all classes are first loaded (if reachable) from this classloader.

raphw commented 1 year ago

That's the plugins class loader, and that's intended as it's needed to run the plugin. Isn't the plugin class loader managed by Eclipse? The Url class loader is closed and discarded at the end of the plugin execution.

laeubi commented 1 year ago

Well you asked why these classes are leaking into eclipse and that's the reason :-) So either it is required to load all the classes from the maven-classpath(!) then it will also happen in a maven run, or you actually don't want all that classes, then you should change the classloader strategy.

So from the javadoc the method behaves wrong, it claims to construct a classlaoder of a maven coordinate but in fact it creates one that loads all classes reachable from the maven-plugin AND additional from the maven coordinate.

So if you need that, but not really ever need to use the classes directly, you should just not pass a parent here, as you are leaking classes from your maven plugin into that loader. If you need this (why?) then you it depends on the way it is used what might be a better strategy.

At least m2e do not really "manages" this, but I'll check if we probably can release some reference, but this will just help with the symptoms and then byte-buddy would load all these classes on each incremental run as well (what might slow things down considerably), so then you probably want to cache this on the build context ;-)

raphw commented 1 year ago

Now I am confused. Why would that cause 800 classes to leak on every execution?

The Byte Buddy Mojo needs to provide the Byte Buddy dependency to its plugins. If the class loader is acting as a parent, the new children can request Byte Buddy classes from the parent, but will load the plugin classes into the new, later discarded loader. If there's a leak, it would be the Maven plugin class loader. If the Maven plugin loader is reused, it should not cause s loading of additional classes later.

odrotbohm commented 1 year ago

I am far from an expert on this and feel free to ignore me on this but isn't the problem we're seeing the opposite of what @laeubi is describing? Wouldn't we want the loaded ByteBuddy classes to be reused in a parent classloader? Instead we seem to see completely new ClassRealm instances created that load the classes again.

laeubi commented 1 year ago

Now I am confused. Why would that cause 800 classes to leak on every execution?

I don#t know as I don't know anything about the byte-buddy plugin, but each class you load from the plugin-realm will not be garbage collected when you close your URLClassloader, that will only free the open files, and possibly allows any no longer referenced class from being garbage collected.

If the class loader is acting as a parent, the new children can request Byte Buddy classes from the parent, but will load the plugin classes into the new, later discarded loader.

No it does not, please read about the default classloader delegation model here:

The ClassLoader class uses a delegation model to search for classes and resources. Each instance of ClassLoader has an associated parent class loader. When requested to find a class or resource, a ClassLoader instance will delegate the search for the class or resource to its parent class loader before attempting to find the class or resource itself. The virtual machine's built-in class loader, called the "bootstrap class loader", does not itself have a parent but may serve as the parent of a ClassLoader instance.

So any class sis first searched by the parent by default, and that parent might cache the class as it will never be unrenferenced when you close the URL one.

Instead we seem to see completely new ClassRealm instances created that load the classes again.

Where exactly do you see this new realm created? Each plugin has an own realm that has a parent of the project, and as said these might be cached.

But one can create a new ClassRealm (with self-first strategy) and only add dedicated packages from the plugin realm.

odrotbohm commented 1 year ago

Where exactly do you see this new realm created?

In the YourKit screenshot posted above, which, admittedly, wasn't really obvious.

Almost every of the ClassRealm instances (in particular the ones having loaded exactly 1239 classes) is a new one for a Maven ByteBuddy plugin invocation and I see a new one added, every time I update the sample project in Eclipse. In my understanding, as long those are referenced kept around they cannot be GCed, and, consequently, the classes they refer to cannot be unloaded, right?

So I was naively thinking that m2e should either reuse one particular instance or not hold on to the ones not used anymore?

odrotbohm commented 1 year ago

If I read the GC root path correctly, it looks like the main mojo class (WithoutRuntimeDependencies) seems to be loaded by a different ClassRealm. Can you point me to the code that does that?

laeubi commented 1 year ago

Its a bit hard to tell from a screenshot, but maybe we can narrow this is a bit down:

  1. Most important one is are the 1239 classes expected to be loaded, e.g. are these classes of your project to be processed? And are you exspect them to be loaded on each incremental run again?
  2. Are there multiple instances of the same class instance stored in different class realms?
  3. Each class relam should have a name and a parent, so it should be possible to put a (conditional) breakpoint and see where they are created and where they are stored.

As far as I know m2e not directly "manages" the realms but calls maven for this, so it is possible that we manually have to "unlink" thinks after an mojo execution, but first we should make sure what happens when and if its expected or not.

odrotbohm commented 1 year ago

The classes contained in each ClassRealm (with those 1239 classes) are the classes of the mojo, executed for the same project in an Eclipse workspace. The realms contain exactly the same classes. Running "Update project" creates another ClassRealm instance for that project with, again, the same classes.

To a noob like me, this looks like instead of reusing the ClassRealm to run the Bytebuddy Maven Plugin, a new one is created. So, either this shouldn't happen in the first place, or the old ClassRealm should be able to be discarded, which for some reason it is not. To summarize:

  1. I'd expect these classes to be loaded for the first build of a project, as that will have to run the plugin declared in the POM. Whether subsequent runs of the build plugin should reuse that ClassRealm or create a fresh one is something that m2e has to decide. If it decides not to reuse the existing one, I'd expect the old one to be discarded and not eat up non-heap memory.
  2. Each of the relevant realms contains the exact same set of classes. In only found a name in there which is …ClassReal plus an instance identifier, no parent. If I understood this correctly, the ClassRealm instances are created from M2EClassWorlds, but AFAIU, those are only created when a new Plexus container is created. 🤔

See the inspection of two of those "identical" ClassRealm instances below. Note, how they point to the same Eclipse project, but are different instances for the same build plugin.

Screenshot 2023-01-22 at 17 19 58 Screenshot 2023-01-22 at 17 20 45

I've uploaded the heap dump here, in case you are eager to explore it yourself. The project to look out for is acme-commerce. I am not really familiar with how to debug m2e inside Eclipse. Any pointers how to do that?

laeubi commented 1 year ago

I've uploaded the heap dump here, in case you are eager to explore it yourself.

I'll try to take a look at it the next days...

I am not really familiar with how to debug m2e inside Eclipse. Any pointers how to do that?

You need a dev environment: https://github.com/eclipse-m2e/m2e-core/blob/master/CONTRIBUTING.md#-developer-resources

then you can start an eclipse instance in debug mode from it and just import your project you like to take a deeper look.

odrotbohm commented 1 year ago

One thing I noticed, the cleanup code in PlexusContainerManager will only dispose containers, if the .mvn in a project is not a directory. Is that intended? The project I have here has a .mvn folder in its root (for the Maven wrapper).

odrotbohm commented 1 year ago

I'll try to take a look at it the next days...

You need a dev environment…

Thanks for that and the pointers, I'll see what I can do.

laeubi commented 1 year ago

One thing I noticed, the cleanup code in PlexusContainerManager will only dispose containers, if the .mvn in a project is not a directory. Is that intended? The project I have here has a .mvn folder in its root (for the Maven wrapper).

The container should effectively never be disposed.

odrotbohm commented 1 year ago

I got to debug the build of the project and I see a ClassRealm instance created for each of the declared build plugins. For the build execution, MavenExecutionContext calls BuildPluginManager.executeMojo(…). That at some point ends up in LifecycleManager.manage(Class) which keeps the handed type in a Map (lifecycles). As that type is a different Class instance, as it in turn gets loaded from a different ClassRealm every time, this Map grows unbounded and maintains strong references to the outdated ClassRealms that then cannot be GCed.

To me, it seems like that, if we want to reuse the underlying core Maven abstraction instances, we have to make sure, that we pass a ClassRealms into Maven via the MojoDescriptor contained in the MojoExecution, as DefaultBuildPluginManager.getPluginRealm(…) uses that. Currently, that realm is null, a new ClassRealm is created, and thus LifecycleManager.manage(…) sees a different Class object for every invocation.

odrotbohm commented 1 year ago

A couple more details: the DefaultBuildPluginManager used in MavenExecutionContext.executeMojo(…) contains a DefaultMavenPluginManager which in turn contains a EclipsePluginRealmCache. That one gets flushed for each call to MavenProjectCache.flushMavenCaches(…) (called for each project update). That cache emptied causes the cache miss that ultimately results in the lookup on LifecycleManager as the ClassRealm needs to be recreated.

laeubi commented 1 year ago

@odrotbohm thanks for the analysis, do you like to provide a PR for this?

If you can implement/fix the caching, it should use some kind of SoftReference approach so the GC can clean it out if is required, the flush at project update is fine as we can't know if the realm was changed due do changed project settings, but on incremental builds it should be fine to cache things.

odrotbohm commented 1 year ago

I'm not confident to do that (yet), as this could be solved in a couple of ways involving parties not involved in the discussion yet, and I don't know what their stance on the topic is. For example, I am totally unaware which of the instances I am seeing in the debug session are and can be cached, how expensive it is to recreate them etc.

I think the main culprit is that the PlexusContainerdoes not get reinitialized for a project updates. To test whether that would make a difference, I've added a call to PlexusContainerManager.dispose() from MavenProjectCaches.flushCaches(…) and now I see the class loading graph behave as expected: piling up new classes for project refreshes, but falling back to the original number of loaded classes after a manual GC.

Do you think that's an adequate approach? I'm sure the design of PlexusContainerManager and especially the way it handles its internal container instances has seen some thought.

laeubi commented 1 year ago

As said the container should stay as it is, but plugin (or project)-realms might be disposed if no longer needed.

odrotbohm commented 1 year ago

I don't think it can as otherwise the LifecycleManager instance stays around which keeps references to classes as described above. I also don't think disposing project ClassRealms will help. With the PluginRealmCache wiped on project update, this causes the new ClassRealms created and build plugin classes loaded with that. I think that wiping the cache is correct and to some degree even needed to reinitialize the build plugins on project updates, but the way LifecycleManager is designed (using classes as Map keys), it would need to be reset as well and – and I am totally not an expert on this – it felt cleaner to me to wipe the entire build DI container rather than trying to get it to reset an individual managed object's instance.

I'm sure there's a way to be more precise and only wipe the container instance for the project under update, but I didn't want to mess with the way the instances are managed, as that looked rather complex and – as I tried to indicate above – in place for a reason. But conceptually, I think it would be consistent to also re-initialize the plexus container to get a fresh start for the project under update, wouldn't it?

laeubi commented 1 year ago

LifecycleManager is designed (using classes as Map keys), it would need to be reset as well and – and I am totally not an expert on this – it felt cleaner to me to wipe the entire build DI container rather than trying to get it to reset an individual managed object's instance.

PlexusContainerManager.dispose() disposes all containers from all builds and creating a container is a really slow operation (can take many seconds if for example remotes must be checked for an update and so on), and a container can also manage many projects. The only valid events for whiping out one()! container would be:

  1. .mvn folder deleted
  2. extensions.xml changes (not implemented yet)

So if there is no way for LifecycleManager to "free" resources, we probabbly should have a EclipseLifecycleManager that allow us to do so.

odrotbohm commented 1 year ago

PlexusContainerManager.dispose() disposes all containers from all builds and creating a container is a really slow operation (can take many seconds if for example remotes must be checked for an update and so on), and a container can also manage many projects. The only valid events for whiping out one()! container would be:

I know, I know. I was just trying to find out whether the LifecycleManager instance was the only instance that keeps references to the ClassRealms preventing them from being collected. It's totally beyond my expertise how one would tweak a plexus container with a customized low-level Maven component and make sure it could get cleaned up properly on a per-project basis.

laeubi commented 1 year ago

@odrotbohm I now have taken a look at LifecycleManager and i think it is actually buggy.

As you said, the Map is filled with classes but there is no way to ever get an item out of this map. As it maintains strong keys, even freeing each and every reference to a class will retain it in the map.

Do you like to report a bugreport to plexus/sisu so it could probably fixed?

Even worse, LifecycleManager.unmanage(Object) wont remove the item from the map either so the bean is stopped but still retained. And as all this is even constructed internally in DefaultPlexusContainer m2e can also not just intercept here.... probably we can try some reflectionmagic but that's really nasty...

odrotbohm commented 1 year ago

Do you like to report a bugreport to plexus/sisu so it could probably fixed?

I've filed a ticket here.

odrotbohm commented 1 year ago

Thanks for the fix, @laeubi! Any chance someone could trigger a nightly build? I'd love to use this in my current dev installation. 🙇

laeubi commented 1 year ago

It should already be deployed: https://ci.eclipse.org/m2e/job/m2e/job/master/

You can install it from here: https://download.eclipse.org/technology/m2e/snapshots/latest/

odrotbohm commented 1 year ago

The update site shows a 2.1.3 build available from this morning. However, my installation already has a 2.2.0 (likely snapshot from Jan, 3rd) installed. Did you downgrade the version number for master at some point? 🤔

odrotbohm commented 1 year ago

It looks like it's been this commit. I've uninstalled the 2.2 version, reinstalled 2.1.2 and trying to upgrade to the 2.1.3 snapshots now results in this:

Your original request has been modified.
  "M2E - Maven Integration for Eclipse" is already installed, so an update will be performed instead.
Cannot complete the install because one or more required items could not be found.
  Software being installed: M2E - Maven Integration for Eclipse 2.1.3.20230125-0927 (org.eclipse.m2e.feature.feature.group 2.1.3.20230125-0927)
  Missing requirement: M2E Maven Integration for Eclipse Core 2.0.6.20230125-0927 (org.eclipse.m2e.core 2.0.6.20230125-0927) requires 'java.package; com.google.gson [2.10.0,3.0.0)' but it could not be found
  Cannot satisfy dependency:
    From: M2E - Maven Integration for Eclipse 2.1.3.20230125-0927 (org.eclipse.m2e.feature.feature.group 2.1.3.20230125-0927)
    To: org.eclipse.equinox.p2.iu; org.eclipse.m2e.core [2.0.6.20230125-0927,2.0.6.20230125-0927]

I unfortunately need the snapshots as 2.1.2 is pretty badly broken due to #1150. :/

HannesWell commented 1 year ago

The update site shows a 2.1.3 build available from this morning. However, my installation already has a 2.2.0 (likely snapshot from Jan, 3rd) installed. Did you downgrade the version number for master at some point? 🤔

It looks like it's been this commit.

That's correct. Just created #1218 to bump the version.

I've uninstalled the 2.2 version, reinstalled 2.1.2 and trying to upgrade to the 2.1.3 snapshots now results in this:

Your original request has been modified.
  "M2E - Maven Integration for Eclipse" is already installed, so an update will be performed instead.
Cannot complete the install because one or more required items could not be found.
  Software being installed: M2E - Maven Integration for Eclipse 2.1.3.20230125-0927 (org.eclipse.m2e.feature.feature.group 2.1.3.20230125-0927)
  Missing requirement: M2E Maven Integration for Eclipse Core 2.0.6.20230125-0927 (org.eclipse.m2e.core 2.0.6.20230125-0927) requires 'java.package; com.google.gson [2.10.0,3.0.0)' but it could not be found
  Cannot satisfy dependency:
    From: M2E - Maven Integration for Eclipse 2.1.3.20230125-0927 (org.eclipse.m2e.feature.feature.group 2.1.3.20230125-0927)
    To: org.eclipse.equinox.p2.iu; org.eclipse.m2e.core [2.0.6.20230125-0927,2.0.6.20230125-0927]

That is very like because we recently upgraded to use gson 2.10. Before we used the gson version provided by our deps but probably nobody else provide gson 2.10 now.

With #1218 gson 2.10 is also included into the m2e-repository.

odrotbohm commented 1 year ago

Thanks for the quick turnaround, Hannes. Awesome team! 🙇