bndtools / bnd

Bnd/Bndtools. Tooling to build OSGi bundles including Eclipse, Maven, and Gradle plugins.
https://bndtools.org
Other
528 stars 305 forks source link

Inconsistent resolve results with bnd-export-maven-plugin 3.4.0-SNAPSHOT #1833

Closed azzazzel closed 6 years ago

azzazzel commented 7 years ago

I've been struggling with this for quite some time as it appeared to happen randomly. But I finally managed to come up with minimalistic project to reproduce the issue. The project is here https://github.com/azzazzel/demo.reslove.issue and it only has 2 maven modules:

Clone the project and run mvn clean install several times. Observe that every time -runbundles changes. It cycles between several combinations like

-runbundles: \
    demo.resolve.issue.java; version='[0.0.1,0.0.2)',\
    log4j; version='[1.2.17,1.2.18)',\
    slf4j.api; version='[1.7.14,1.7.15)',\
    slf4j.log4j12; version='[1.6.1,1.6.2)'
-runbundles: \
    demo.resolve.issue.java; version='[0.0.1,0.0.2)',\
    log4j; version='[1.2.17,1.2.18)',\
    org.apache.logging.log4j.api; version='[2.5.0,2.5.1)',\
    slf4j.api; version='[1.7.14,1.7.15)',\
    slf4j.log4j12; version='[1.6.1,1.6.2)'
-runbundles: \
    demo.resolve.issue.java; version='[0.0.1,0.0.2)',\
    log4j; version='[1.2.17,1.2.18)',\
    slf4j.api; version='[1.7.12,1.7.13)',\
    slf4j.log4j12; version='[1.6.1,1.6.2)'

but also some others. Occasionally it stick with the same resolution for several builds (sometimes even tens) but eventually it will change again.

With more complex builds sometime it will fail to resolve and then resolve after simply re-running! Using BndTools in combination with bnd-export-maven-plugin seams to make things worse but I have no proof of that.

rotty3000 commented 7 years ago

Thanks @azzazzel . I'll look into it as I've also been struggling with it. One thing I will check is if the last fix to the felix resolver made by @tjwatson resolves the issue (you see what I did there...) That fix is not yet in any release btw so we'll have to do some fiddling.

bjhargrave commented 7 years ago

We need to make sure the inputs to the resolver don't vary. Then the issue is an instability in the resolver algorithm.

rotty3000 commented 7 years ago

This is very likely https://issues.apache.org/jira/browse/FELIX-5450 since the project contains the log4j/slf4j logging fragments which is where we seem to always find this issue.

azzazzel commented 7 years ago

@bjhargrave I'm not sure what do you mean by "make sure the inputs to the resolver don't vary". Is the value of -runbundles: considered an input to the resolver? Because that is the ONLY thing that changes between executions?

bjhargrave commented 7 years ago

No. -runbundles is an output (the prior value is used to preserve order in the output).

Inputs to the resolver algorithm include -runrequires and other things about the env (e.g. -runfw, -runee.)

My concern was that bnd is providing consistent inputs to the resolver. If this is true, then the issue is instability in the resolver algorithm.

timothyjward commented 7 years ago

My concern was that bnd is providing consistent inputs to the resolver. If this is true, then the issue is instability in the resolver algorithm.

Another place that bnd could be screwing up is when inserting hosted capabilities. I'm still not totally sure that we do the right thing in that method.

rotty3000 commented 7 years ago

I've managed to fix the behaviour of the bnd-export plugin which was known to already be misbehaving besides the unstable resolves. This does make it significantly more stable output (because there were two different change calculations happening). I'll make an issue separately for that. Meanwhile, it seems that even with an updated felix resolver (1.10.2-SNAPSHOT) it's still not perfectly stable.

rotty3000 commented 7 years ago

Once the other change to the bnd-export plugin gets in I will beg @tjwatson to have a look against your repo to see if we can figure out why the resolver produces different results.

rotty3000 commented 7 years ago

FYI here's an example (after my changes) where the output is still showing instability:

[INFO] -runbundles:\
    demo.resolve.issue.java;version='[0.0.1,0.0.2)',\
    log4j;version='[1.2.17,1.2.18)',\
    slf4j.log4j12;version='[1.6.1,1.6.2)',\
    slf4j.api;version='[1.7.12,1.7.13)'

... later with no input changes ...

[INFO] -runbundles:\
    demo.resolve.issue.java;version='[0.0.1,0.0.2)',\
    log4j;version='[1.2.17,1.2.18)',\
    slf4j.log4j12;version='[1.6.1,1.6.2)',\
    slf4j.api;version='[1.7.14,1.7.15)'

Note the change of slf4j.api versions.

azzazzel commented 7 years ago

Btw, I forgot to mention something that may be important. I'm unable to reproduce this with 3.3.0 version of the plugin. If the problem is in 3.3.0 it seams it behaves stable enough to not show up in the last 100 or so builds.

azzazzel commented 7 years ago

However when I set my pom to 3.3.0 I often get

[ERROR] Failed to execute goal biz.aQute.bnd:bnd-export-maven-plugin:3.3.0:export (default) on project bndrun: null: MojoExecutionException: NullPointerException -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal biz.aQute.bnd:bnd-export-maven-plugin:3.3.0:export (default) on project bndrun: null
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
    at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
    at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
    at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
    at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: org.apache.maven.plugin.MojoExecutionException
    at aQute.bnd.maven.export.plugin.ExportMojo.execute(ExportMojo.java:52)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
    ... 20 more
Caused by: java.lang.NullPointerException
    at org.apache.felix.resolver.util.CopyOnWriteList.<init>(CopyOnWriteList.java:42)
    at org.apache.felix.resolver.util.ShadowList.<init>(ShadowList.java:31)
    at org.apache.felix.resolver.Candidates.prepare(Candidates.java:917)
    at org.apache.felix.resolver.ResolverImpl.resolve(ResolverImpl.java:232)
    at org.apache.felix.resolver.ResolverImpl.resolve(ResolverImpl.java:179)
    at biz.aQute.resolve.ResolveProcess.resolveRequired(ResolveProcess.java:72)
    at biz.aQute.resolve.ProjectResolver.resolve(ProjectResolver.java:96)
    at biz.aQute.resolve.ProjectResolver.getRunBundles(ProjectResolver.java:113)
    at aQute.bnd.maven.export.plugin.ExportMojo.resolve(ExportMojo.java:66)
    at aQute.bnd.maven.export.plugin.ExportMojo.execute(ExportMojo.java:46)
    ... 22 more

and I also get NPEs someties resolving from BndTools.

Not sure if it's related.

rotty3000 commented 7 years ago

The 3.3 is already known to have resolve issues which are fixed in master! We need to test against the latest HEAD.

tjwatson commented 7 years ago

Ray mentions a variation of results. My question is are both results valid? If so, that is no fault of the resolver. It is only required to find a valid result. There are some cases where random set traversal may introduce inconsistent results.

bjhargrave commented 7 years ago

There are some cases where random set traversal may introduce inconsistent results.

How is it random? Does the resolver use a randomizer to select paths to evaluate? I would expect the resolver to use a stable algorithm.

rotty3000 commented 7 years ago

In my tests all the results seem valid.

However, I agree with BJ that there should not be any randomness in the internal calculations made by the resolver. It would seem that there might be a map used somewhere in the resolver which doesn't produce a deterministic order, such as plain HashMap (or anything akin to it).

bjhargrave commented 7 years ago

It would seem that there might be a map used somewhere in the resolver which doesn't produce a deterministic order, such as plain HashMap (or anything akin to it).

But the hashing algorithm of the map is stable across runs of the same vm. I can see that different vms could change their hashing algorithm but that is not what we are talking about here.

rotty3000 commented 7 years ago

@tjwatson to expand on my answer, I've tested all the results I've obtained to make sure they are actually valid, and they do seem to work without issue.

rotty3000 commented 7 years ago

FYI, the other change to improve the stability of bnd-export-maven-plugin is merged. So please try updating to the latest. I can help anyone wanting to setup a test environment using @azzazzel's repo and the latest bnd/felix resolver.

tjwatson commented 7 years ago

I not sure if the hashing causes the issue here. I mention it because internally the resolver uses it own map impl which may not give consistent ordering. Regardless, the resolver it not required to give consistent results, just valid results.

rotty3000 commented 7 years ago

Sigh... this will result in the inability to generate stable exports.

Perhaps we have to approach it another way then. We may want to verify if the existing -runbundles still works for the given inputs before attempting a real resolve. I think someone requested this already somewhere else.

timothyjward commented 7 years ago

Sigh... this will result in the inability to generate stable exports.

In the case where run bundles is already populated should the "resolve" not be pre-loading those resources and validating that the combination resolves (i.e. excluding all other repo content)? The next step would be to make sure that all of the -runrequires are satisfied by the runbundles.

bjhargrave commented 7 years ago

Regardless, the resolver it not required to give consistent results, just valid results.

I think this is wrong. The resolver is not required to give the best possible result. It is just required to give valid results. This says nothing at all about stability of the algorithm. I think the resolver must be a stable algorithm. To the extent there are areas in the implementation that can be fixed to provide stability, we should do so.

bjhargrave commented 7 years ago

In the case where run bundles is already populated should the "resolve" not be pre-loading those resources and validating that the combination resolves

-runbundles is an output from the resolver, not an input. When the tool (e.g. bnd) calling the resolver processes the resolver results, it can use the existing -runbundles to preserve any existing ordering in the generated -runbundles.

Validating that -runbundles is a valid closure, is a separate task which has been proposed by https://github.com/bndtools/bnd/issues/1827.

timothyjward commented 7 years ago

-runbundles is an output from the resolver, not an input. When the tool (e.g. bnd) calling the resolver processes the resolver results, it can use the existing -runbundles to preserve any existing ordering in the generated -runbundles.

I understand this - my question is, why is this being re-calculated with every build when it has already been created (and potentially customised) by the user? Surely the more important thing for the plugin to do is to verify that the runbundles is a) transitively closed, and b) satisfies the run requirements...

rotty3000 commented 7 years ago

..furthermore, we have no idea if the repository configuration had changed at any time between the previous resolve and now, which might give a "better" result. By always asserting that the current runbundles is a closure alone might not be the desired outcome, so we'd force a deletion of the existing -runbundles.

I think @bjhargrave is right on both counts: 1) verify runbundles should be a separate, distinct step (not an automatic, implicit one) 2) Given a set of repos, and distinct input to the resolver, the resolver should give stable output. It's what I think people are already expecting as this issue clearly shows. Even as an experienced OSGi developer, I would have never guessed the output was simply "valid" and not "stable".

milendyankov commented 7 years ago

Regardless, the resolver it not required to give consistent results, just valid results.

In my actual usecase it does give invalid results sometimes. By invalid results I mean it sometimes fails to resolve even though nothing has changed. It may then resolve on the next run and than fail again later on. Seems quite random but there are a lot more bundles involved. It is pefectly possible that I'm doing something wrong there thus the minimalistic project. I was not able to reproduce the failing part with it though, only the inconsistency.

rotty3000 commented 7 years ago

But the failure you noted is already solved in 3.4 that's why you don't see it in the sample project which uses those plugins. However, if you do see a different failure than the one listed above, please add it.

milendyankov commented 7 years ago

Just to make sure we are on the same page. The NPA I mentioned few posts ago is indeed in 3.3 and does not happen in 3.4! However in 3.4 with more complex set of bundles it does fail sometimes as well. It's not NPE but simply "can not resolve". Most of the time it can not resolve packages from SLF4J from what I recall. I switched back to 3.3 some time ago so I don't remember exactly but I can try to switch again to 3.4 and check.

milendyankov commented 7 years ago

Oh and before everyone get totally confused, @azzazzel is my other (private) github account :) Sorry for posting from two different accounts.

milendyankov commented 7 years ago

I switched to 3.4.0-SNAPSHOT and I just run mvn clean install many times (no changes).

My input is

-runrequires: \
    osgi.identity;filter:='(osgi.identity=com.commsen.liferay.builtwith.rest)',\
    osgi.identity;filter:='(osgi.identity=com.eclipsesource.jaxrs.publisher)'

Sometimes it resolves to this:

-runbundles: \
    com.commsen.liferay.builtwith.api; version='[1.0.0,1.0.1)',\
    com.commsen.liferay.builtwith.provider; version='[1.0.0,1.0.1)',\
    com.commsen.liferay.builtwith.rest; version='[1.0.0,1.0.1)',\
    com.commsen.liferay.builtwith.storage.wedeploy; version='[1.0.0,1.0.1)',\
    com.eclipsesource.jaxrs.jersey-all; version='[2.22.2,2.22.3)',\
    com.eclipsesource.jaxrs.publisher; version='[5.3.1,5.3.2)',\
    com.google.gson; version='[2.7.0,2.7.1)',\
    log4j; version='[1.2.17,1.2.18)',\
    org.apache.felix.configadmin; version='[1.8.8,1.8.9)',\
    org.apache.felix.http.jetty; version='[3.2.0,3.2.1)',\
    org.apache.felix.http.servlet-api; version='[1.1.2,1.1.3)',\
    org.apache.felix.scr; version='[2.0.2,2.0.3)',\
    org.eclipse.equinox.event; version='[1.3.100,1.3.101)',\
    org.eclipse.equinox.metatype; version='[1.4.100,1.4.101)',\
    org.jodd.bean; version='[3.6.7,3.6.8)',\
    org.jodd.core; version='[3.6.7,3.6.8)',\
    org.jodd.http; version='[3.6.7,3.6.8)',\
    org.jodd.json; version='[3.6.7,3.6.8)',\
    org.jodd.upload; version='[3.6.7,3.6.8)',\
    org.jsoup; version='[1.10.1,1.10.2)',\
    org.osgi.service.event; version='[1.3.1,1.3.2)',\
    org.osgi.service.metatype; version='[1.3.0,1.3.1)',\
    slf4j.api; version='[1.6.1,1.6.2)',\
    slf4j.api; version='[1.7.21,1.7.22)',\
    slf4j.log4j12; version='[1.6.1,1.6.2)'

which is correct and works! Other times it resolves to

-runbundles: \
    com.commsen.liferay.builtwith.api;version='[1.0.0,1.0.1)',\
    com.commsen.liferay.builtwith.provider;version='[1.0.0,1.0.1)',\
    com.commsen.liferay.builtwith.rest;version='[1.0.0,1.0.1)',\
    com.commsen.liferay.builtwith.storage.wedeploy;version='[1.0.0,1.0.1)',\
    com.eclipsesource.jaxrs.jersey-all;version='[2.22.2,2.22.3)',\
    com.eclipsesource.jaxrs.publisher;version='[5.3.1,5.3.2)',\
    com.google.gson;version='[2.7.0,2.7.1)',\
    log4j;version='[1.2.17,1.2.18)',\
    org.apache.felix.configadmin;version='[1.8.8,1.8.9)',\
    org.apache.felix.http.jetty;version='[3.2.0,3.2.1)',\
    org.apache.felix.http.servlet-api;version='[1.1.2,1.1.3)',\
    org.apache.felix.scr;version='[2.0.2,2.0.3)',\
    org.eclipse.equinox.event;version='[1.3.100,1.3.101)',\
    org.eclipse.equinox.metatype;version='[1.4.100,1.4.101)',\
    org.jodd.bean;version='[3.6.7,3.6.8)',\
    org.jodd.core;version='[3.6.7,3.6.8)',\
    org.jodd.http;version='[3.6.7,3.6.8)',\
    org.jodd.json;version='[3.6.7,3.6.8)',\
    org.jodd.upload;version='[3.6.7,3.6.8)',\
    org.jsoup;version='[1.10.1,1.10.2)',\
    org.osgi.service.event;version='[1.3.1,1.3.2)',\
    org.osgi.service.metatype;version='[1.3.0,1.3.1)',\
    slf4j.api;version='[1.7.21,1.7.22)'

Which fails at runtime with

Failed to start bundle slf4j.api-1.7.21, exception Could not resolve module: slf4j.api [23]
  Unresolved requirement: Import-Package: org.slf4j.impl; version="1.6.0"

And occasionally it resolves to

-runbundles: \
    slf4j.api;version='[1.6.4,1.6.5)',\
    slf4j.log4j12;version='[1.6.1,1.6.2)'

Yep, just the 2 bundles that's not a typo. Also sometimes it fails with

[ERROR] Failed to execute goal biz.aQute.bnd:bnd-export-maven-plugin:3.4.0-SNAPSHOT:export (default) on project runtime: null: MojoExecutionException: NullPointerException -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal biz.aQute.bnd:bnd-export-maven-plugin:3.4.0-SNAPSHOT:export (default) on project runtime: null
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
    at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
    at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
    at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
    at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: org.apache.maven.plugin.MojoExecutionException
    at aQute.bnd.maven.export.plugin.ExportMojo.execute(ExportMojo.java:66)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
    ... 20 more
Caused by: java.lang.NullPointerException
    at java.util.Objects.requireNonNull(Objects.java:203)
    at java.util.ArrayList.removeAll(ArrayList.java:689)
    at aQute.bnd.maven.export.plugin.ExportMojo.resolve(ExportMojo.java:131)
    at aQute.bnd.maven.export.plugin.ExportMojo.export(ExportMojo.java:85)
    at aQute.bnd.maven.export.plugin.ExportMojo.execute(ExportMojo.java:63)
    ... 22 more
rotty3000 commented 7 years ago

I think we should close this issue:

The Problem: The resolver returns different solutions with no changes to inputs.

This can result from a combination of two valid scenarios:

Workaround: Prune the resolve repositories so they contain fewer solutions.

Some commonly witnessed causes are:

njbartlett commented 7 years ago

Reopening and assigning to myself as I have also seen this problem in 3.4.0.REL. As additional information, I believe this also be triggered when a package is exported substitutably (i.e. exported and imported) by multiple bundles.

rotty3000 commented 6 years ago

I think that we can reasonably close this issue. There has been significant improvements to bnd as well as a couple updates to the felix resolver and where we used to see these errors, we don't see them anymore. However, I do believe there's still a certain amount of potential for the failure on a poorly curated repository.

If the issue comes up again using the latest bnd then we can reopen it at that time.

milendyankov commented 6 years ago

It seams in 3.5 bnd-export-maven-plugin with <resolve>true</resolve> does resolve but does not update -runbundles: anymore. Is that by design? If so is there any way to see what it resolves to (other than inspecting the jar file)?

rotty3000 commented 6 years ago

There's a separate bnd-resolver-maven-plugin which produces sticky -runbundles updates.

https://github.com/bndtools/bnd/blob/master/maven/bnd-resolver-maven-plugin/README.md

rotty3000 commented 6 years ago

otherwise, yes it's by design.