bndtools / bnd

Bnd/Bndtools. Tooling to build OSGi bundles including Eclipse, Maven, and Gradle plugins.
https://bndtools.org
Other
531 stars 304 forks source link

NPE in ActivelyClosingClassLoader if the bundle is not started #5570

Closed laeubi closed 1 year ago

laeubi commented 1 year ago

ActivelyClosingClassLoader access the bundle context here

https://github.com/bndtools/bnd/blob/8ab7b919cc0c4eee0a7d6f05b06327b5e81f74d7/biz.aQute.bndlib/src/aQute/bnd/osgi/ActivelyClosingClassLoader.java#L219-L223

As the bundle biz.aQute.bndlib is not marked as lazy-start it is possible that the bundle is in RESOLVED state and then BundleCOntext is null and a NPE is thrown.

I see the following options:

kriegfrj commented 1 year ago

Thanks Christoph!

Quick question: Is this a theoretical problem, or has it actually happened to you? If you've seen it manifest, do you have a stack trace you can add to this issue? The reason I ask is because if it's actually happened there may be other contributing factors we might need to look at to see if they need attention too.

ActivelyClosingClassLoader access the bundle context here

https://github.com/bndtools/bnd/blob/8ab7b919cc0c4eee0a7d6f05b06327b5e81f74d7/biz.aQute.bndlib/src/aQute/bnd/osgi/ActivelyClosingClassLoader.java#L219-L223

As the bundle biz.aQute.bndlib is not marked as lazy-start it is possible that the bundle is in RESOLVED state and then BundleCOntext is null and a NPE is thrown.

I admit I'm not a bundle lifecycle guru, but my understanding is that if this code is running, and it's not a "lazy start" bundle, then shouldn't the bundle have already been moved into the "ACTIVE" state before the code started executing?

I see the following options:

  • Mark the bundle as lazy-start
  • check for null and throw a meaningful exception
  • just doing nothing and do not try to load from the system bundle

I believe that there is a philosophical objection to lazy-start here. Admittedly my experience with it has been limited, but that limited experience is painful.

The third would change behaviour in a way that is not backward compatible. While I'm not against breaking backward compatibility if there is a good reason, I don't think that this is a sufficiently good reason in this case as there is a simple and viable alternative (option 2).

So I think that option 2 is the best. There is already a template to follow in terms of "throwing a meaningful exception" a couple lines above the code in question (where it checks for bundle == null). Although I would prefer adding extra information into the exception to make it even more meaningful - perhaps chaining the original CNFE and wrapping it in a CNFE that describes the error - eg:

throw new CNFE("BundleContext was null - check if bundle is resolved", nfe);

Are you able to put a PR together?

laeubi commented 1 year ago

Quick question: Is this a theoretical problem, or has it actually happened to you?

I encounters this in the context of

an example stacktrace looks like this (shortened it a bit here):

ERROR: Exception: java.lang.ClassNotFoundException: aQute.bnd.repository.maven.provider.MavenBndRepository not found, parent: org.eclipse.osgi.internal.loader.EquinoxClassLoader@48153e50[biz.aQute.bndlib:6.3.1.202206071316(id=1233)] urls:[] exception:java.lang.NullPointerException: Cannot invoke "org.osgi.framework.BundleContext.getBundle(long)" because the return value of "org.osgi.framework.Bundle.getBundleContext()" is null
    ....
    at aQute.bnd.osgi.ActivelyClosingClassLoader.loadClass(ActivelyClosingClassLoader.java:234)
    at aQute.bnd.osgi.Processor$CL.loadClass(Processor.java:1452)
    at aQute.bnd.osgi.PluginsContainer.loadPlugin(PluginsContainer.java:584)
    at aQute.bnd.osgi.PluginsContainer.loadPlugins(PluginsContainer.java:468)
    at aQute.bnd.osgi.PluginsContainer.init(PluginsContainer.java:176)
    at aQute.bnd.osgi.Processor.lambda$newPluginsContainer$0(Processor.java:472)
    at aQute.bnd.memoize.MemoizingSupplier.lambda$new$1(MemoizingSupplier.java:31)
    at aQute.bnd.memoize.MemoizingSupplier.get(MemoizingSupplier.java:51)
    at aQute.bnd.memoize.MemoizingSupplier.lambda$new$1(MemoizingSupplier.java:31)
    at aQute.bnd.memoize.MemoizingSupplier.get(MemoizingSupplier.java:51)
    at aQute.bnd.osgi.Processor.getPlugins(Processor.java:462)
    at aQute.bnd.osgi.Processor.getPlugins(Processor.java:439)
    at aQute.bnd.build.Workspace.signal(Workspace.java:604)
    at aQute.bnd.build.Workspace.signal(Workspace.java:620)
    at aQute.bnd.osgi.Processor.error(Processor.java:292)
    at aQute.bnd.osgi.Processor.exception(Processor.java:341)
    at aQute.bnd.osgi.PluginsContainer.loadPlugin(PluginsContainer.java:603)
    at aQute.bnd.osgi.PluginsContainer.loadPlugins(PluginsContainer.java:468)
    at aQute.bnd.osgi.PluginsContainer.init(PluginsContainer.java:176)
    at aQute.bnd.osgi.Processor.lambda$newPluginsContainer$0(Processor.java:472)
    at aQute.bnd.memoize.MemoizingSupplier.lambda$new$1(MemoizingSupplier.java:31)
    at aQute.bnd.memoize.MemoizingSupplier.get(MemoizingSupplier.java:51)
    at aQute.bnd.memoize.MemoizingSupplier.lambda$new$1(MemoizingSupplier.java:31)
    at aQute.bnd.memoize.MemoizingSupplier.get(MemoizingSupplier.java:51)
    at aQute.bnd.osgi.Processor.getPlugins(Processor.java:462)
    at aQute.bnd.osgi.PluginsContainer$PluginsSpliterator.parent(PluginsContainer.java:273)
    at aQute.bnd.osgi.PluginsContainer$PluginsSpliterator.forEachRemaining(PluginsContainer.java:315)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
    at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
    at aQute.bnd.osgi.PluginsContainer.getPlugins(PluginsContainer.java:345)
    at aQute.bnd.osgi.PluginsContainer.postInit(PluginsContainer.java:189)
    at aQute.bnd.osgi.Processor.lambda$newPluginsContainer$1(Processor.java:481)
    at aQute.bnd.memoize.MemoizingSupplier.lambda$new$1(MemoizingSupplier.java:32)
    at aQute.bnd.memoize.MemoizingSupplier.get(MemoizingSupplier.java:51)
    at aQute.bnd.osgi.Processor.getPlugins(Processor.java:462)
    at aQute.bnd.osgi.Processor.getPlugin(Processor.java:451)
    at aQute.bnd.osgi.Processor.doIncludes(Processor.java:823)
    at aQute.bnd.osgi.Processor.setProperties(Processor.java:750)
    at aQute.bnd.osgi.Processor.setProperties(Processor.java:746)
    at aQute.bnd.osgi.Processor.setProperties(Processor.java:975)
    at aQute.bnd.osgi.Processor.setProperties(Processor.java:964)
    at aQute.bnd.build.Project.<init>(Project.java:195)
    at aQute.bnd.build.Project.<init>(Project.java:202)
    at aQute.bnd.build.ProjectTracker.update(ProjectTracker.java:117)
    at aQute.bnd.build.ProjectTracker.getAllProjects(ProjectTracker.java:56)
    at aQute.bnd.build.Workspace.open(Workspace.java:370)
    at aQute.bnd.build.Workspace.getWorkspace(Workspace.java:301)
    at aQute.bnd.build.Workspace.getWorkspace(Workspace.java:245)
    at aQute.bnd.build.Workspace.getProject(Workspace.java:224)
    at <user code using the BND lib>
Caused by: java.lang.NullPointerException: Cannot invoke "org.osgi.framework.BundleContext.getBundle(long)" because the return value of "org.osgi.framework.Bundle.getBundleContext()" is null
    at aQute.bnd.osgi.ActivelyClosingClassLoader.loadClass(ActivelyClosingClassLoader.java:220)
    ... 71 more

I admit I'm not a bundle lifecycle guru, but my understanding is that if this code is running, and it's not a "lazy start" bundle, then shouldn't the bundle have already been moved into the "ACTIVE" state before the code started executing?

If the bundle is not marked as lazy start then code can be loaded (and of course executed) while it is in RESOLVED state, what usually is happening for API or library bundles that neither have an activator nor any components. That's what lazy activation does, it activates the bunlde as soon as any class is loaded from it, while without it depends on the configuration.

I believe that there is a philosophical objection to lazy-start here. Admittedly my experience with it has been limited, but that limited experience is painful.

Can you explain what exact "pains" you mean? Nerveless it is just one alternative as the code seem to rely on activation of the bundle here so thats the main point for lazy activation and actually I'm not aware of any issues with that (beside the philosophical part)

The third would change behaviour in a way that is not backward compatible.

As it do not work in such case and just gives an error, I don't think its really a concern of compatibility, the code will just fail with a different message...

throw new CNFE("BundleContext was null - check if bundle is resolved", nfe);

The bundle is resolved, it is just not started and therefore has a classloader but not a bundlecontext.

pkriens commented 1 year ago

I am not sure this is very important? This is caused by not having an important dependency on the classpath or in a bundle. (Your report does not indicate where you run this code, in an OSGi fw or class path based?). The line also fails when the code is run outside OSGi so lazy start is not an option.

An NP check would be nice. In those small cases, it is easier for me to just propose a PR with a 1 line check.

pkriens commented 1 year ago

1) A PR would be appreciate to make sure the NPE does not occur. 2) I do not think this needs further actions?

laeubi commented 1 year ago
  1. A PR would be appreciate to make sure the NPE does not occur.

    1. I do not think this needs further actions?

Don't you think that (2) is contradicting to whats written in (1) ?

timothyjward commented 1 year ago

Don't you think that (2) is contradicting to whats written in (1) ?

I’m not sure that it does. A fix to avoid failing with an NPE would be appreciated for a nicer message, but no change in behaviour is needed, i.e. keep throwing CNFE.

laeubi commented 1 year ago

But thats not what the close message says and that is not what was reported here, I reported a NPE and the issue was closed with a comment that no further action is required.

So is a PR not an "action" but...?!?

laeubi commented 1 year ago

PR is here:

timothyjward commented 1 year ago

But thats not what the close message says and that is not what was reported here, I reported a NPE and the issue was closed with a comment that no further action is required.

That’s only the case if you ignore point 1. The first point says that a simple PR to avoid the NPE (the reported issue) is fine.

So is a PR not an "action" but...?!?

This is only the case if you ignore a word. The second point says that after that PR no further action is required. The further is an important word indicating that while @pkriens agrees that NPE is best avoided, the actual end behaviour is acceptable and does not need to be changed.

laeubi commented 1 year ago

Maybe it would then better be invested some more words in the first place than closing an issue with such confusing an missunderstandable statement that must be explained afterward.

Saying PR is welcome and closing the issue is at best confusing (why should I create a PR for an issue that is closed as completed) if not leaving a feeling to the author that has investigated the issue and provided complete error details with the feeling that the project simply don't care at all for contributions.