bndtools / bnd

Bnd/Bndtools. Tooling to build OSGi bundles including Eclipse, Maven, and Gradle plugins.
https://bndtools.org
Other
531 stars 304 forks source link

Resolver runs in circles till it eats all the memory #4172

Closed juergen-albert closed 2 years ago

juergen-albert commented 4 years ago

Taking the discussion from #4154 to a new Bug, so we can look at this independently. This may end up becoming a bug for the resolver, but we can do the analysis first.

The current assumption is, that the resolver tears itself apart, when a. it has two many candidates for certain requirements b. the Repositories have Artifacts with "unclean" Metadata (like Require Bundle)

An example of a failing resolve is: https://gitlab.com/gecko.io/geckoEMF-Tooling/-/tree/failing_resolve

Here the resolve dies for https://gitlab.com/gecko.io/geckoEMF-Tooling/-/blob/failing_resolve/org.gecko.emf.osgi.codegen/codegen.bndrun if we remove the runblacklist the resolve works just fine.

juergen-albert commented 4 years ago

After comparing a few resolver logs I may have found some clues.

The resolver stumbles seems to run circles with requirements for the package org.osgi.util.function and org.osgi.util.promise because they are included and exported in a lot of bundles.

I've seen thousends of entries with:

DEBUG: Candidate permutation failed due to a conflict between imports; will try another if possible. (Uses constraint violation. Unable to resolve resource org.apache.felix.scr [org.apache.felix.scr version=2.1.16.v20200110-1820] because it is exposed to package 'org.osgi.util.promise' from resources org.eclipse.osgi.util [org.eclipse.osgi.util version=3.5.300.v20190708-1141] and org.eclipse.osgi.util [org.eclipse.osgi.util version=3.5.300.v20190708-1141] via two dependency chains.

Chain 1:
  org.apache.felix.scr [org.apache.felix.scr version=2.1.16.v20200110-1820]
    import: (&(osgi.wiring.package=org.osgi.util.promise)(version>=1.0.0)(!(version>=2.0.0)))
     |
    export: osgi.wiring.package: org.osgi.util.promise
  org.eclipse.osgi.util [org.eclipse.osgi.util version=3.5.300.v20190708-1141]

Chain 2:
  org.apache.felix.scr [org.apache.felix.scr version=2.1.16.v20200110-1820]
    import: (&(osgi.wiring.package=org.osgi.service.component.runtime)(version>=1.4.0)(!(version>=1.5.0)))
     |
    export: osgi.wiring.package=org.osgi.service.component.runtime; uses:=org.osgi.util.promise
  org.apache.felix.scr [org.apache.felix.scr version=2.1.14]
    import: (&(osgi.wiring.package=org.osgi.util.promise)(version>=1.0.0)(!(version>=2.0.0)))
     |
    export: osgi.wiring.package: org.osgi.util.promise
  org.eclipse.osgi.util [org.eclipse.osgi.util version=3.5.300.v20190708-1141])
juergen-albert commented 4 years ago

I've got my issues resolved, by systematically blacklisting bundles that cause the errors from above. Right now, the log is a bit hidden and it is quite cumbersome to find the offending bundles in the mass of log entries.

At the moment the wizard comes up, when the resolve job finished with any kind of result. I'd like to change this and want to get the wizard up, while the job is running, With this, we can show such messages directly to the user. If I can get the information via the ResolutionCallback as well, I can even list the offending candidates with an option to blacklist them and rerun the resolve process before it gets out of hand. .

juergen-albert commented 4 years ago

@bjhargrave could you assign the issue to me?

pkriens commented 4 years ago

I think the approach should be to sort the candidates we provide from the context to the resolve. Could we come up with some heuristic that puts the official bundle at the front and the alternatives at the back?

If you're not extremely careful, blacklisting quickly make the resolver deteriorate into assembling a plain old -runbundles :-( Just way slower.

juergen-albert commented 4 years ago

@pkriens I would prefere a automated solution as well, but giving additional statistics, a human can read and understand would help as well. If we provide such information, we can aether solve it via deny listing the bundles or even better throw out the offending dependencies at all. In my cases they have been there by accident anyway.

juergen-albert commented 4 years ago

I plan setting my resident astro physics phd up, so she can analyse the issue. She is the right person to come up with heuristics for this.

kriegfrj commented 4 years ago

At the moment the wizard comes up, when the resolve job finished with any kind of result. I'd like to change this and want to get the wizard up, while the job is running, With this, we can show such messages directly to the user. If I can get the information via the ResolutionCallback as well, I can even list the offending candidates with an option to blacklist them and rerun the resolve process before it gets out of hand. .

:+1: I'd like to add to this wish list:

pkriens commented 4 years ago

If we make the window non-modal we could even start multiple resolutions.

kriegfrj commented 3 years ago

Just adding a couple more data points to this:

Like @juergen-albert suggested, I think this issue occurs when you have multiple bundles exporting the same or nearly the same set of bundles.

One clear case where I keep stumbling into this is when trying to resolve a runtime for iDempiere. There are at least two different pairs of culprits:

  1. The manually-resolved set of runbundles for iDempiere contains com.sun.jakarta.mail:1.6.3 and jakarta.mail.api:1.6.3. These bundles contain a different set of packages and are both seemingly required for iDempiere, but both also export the packages for javax.mail.*. The resolver log indicates that a lot of time is being spent trying each of them as candidates to link against other bundles that import these packages.
  2. The runbundles also contains org.apache.commons.logging and org.springframework.spring-jcl, which similarly have a different set of exported package but with a significant intersection between the two.

The interesting thing is that although the resolver takes a long time to resolve the full set of runbundles (I've actually never seen it finish), when you start the framework with the manually-entered list of runbundles it starts normally & quickly.

juergen-albert commented 3 years ago

Maybe this is related: https://issues.apache.org/jira/browse/FELIX-6358

Short summary: The same bundle from two repositories (in the given case with different jar names) caused the resolver to run in circles. If the same thing happens all the time 2 identical bundles from different repositories are around we could have another candidate.

BTW: Usually I try to solve this issue by removing one of the candiates from the repositories or if this is not possible, by blacklisting one in the bndrun. In my case one came from p2 and one as a transitive dependency of a BOM so black listing is the only option. I tried to put the GAV in the bnd.identity but that did not work. BSN and Version is no option, because then I would blacklist both bundles. Any idea?

bjhargrave commented 3 years ago

Short summary: The same bundle from two repositories (in the given case with different jar names) caused the resolver to run in circles. If the same thing happens all the time 2 identical bundles from different repositories are around we could have another candidate.

Strangely, I am debugging a fix for this problem right now! The specific issue I am working in is the same exact file visible from multiple repositories. That is, the Resource objects are equals. If the same bundle is in different files, then there Resource objects are not equals.

This Resource is equal to another Resource if both have the same content and come from the same location. Location may be defined as the bundle location if the resource is an installed bundle or the repository location if the resource is in a repository.

bjhargrave commented 3 years ago

See https://github.com/bndtools/bnd/pull/4409 for a resolve context fix about duplicate capabilities. This will help when the resources are equal such an index of .m2 and a maven ImplicitFileSetRepository.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. Given the limited bandwidth of the team, it will be automatically closed if no further activity occurs. If you feel this is something you could contribute, please have a look at our Contributor Guide. Thank you for your contribution.

stale[bot] commented 2 years ago

This issue has been automatically closed due to inactivity. If you can reproduce this on a recent version of Bnd/Bndtools or if you have a good use case for this feature, please feel free to reopen the issue with steps to reproduce, a quick explanation of your use case or a high-quality pull request.