[bndtools] support cancel on resolve operation

juergen-albert commented 4 years ago

This is a bit connected to #3201 . If we resolve something in the IDE, there is no possibility to stop the Job. You may already have experienced this yourself. If the possibilities for the resolver are too vast, it simply eats up all the memory and you can only restart eclipse or it runs in an OOM.

Eclipse Jobs support cancellation and I already took a look at the issue. At the moment I don't really see a way to tell the resolver to stop. The only idea I had was to use a ResolutionCallback to throw an Exception, when somebody triggers a cancel from the outside. This seems a bit dirty though and I have no clue, how regular this callbacks may be called when the resolver is in full swing.

Any Ideas?

pkriens commented 4 years ago

The callback is the only way to go ...

kriegfrj commented 4 years ago

It would be even nicer if the resolve operation could give some kind of useful feedback about the resolve operation as it is in progress - then you might get a clue early on if it's gone down a rabbit warren or else if it might finish in a reasonable time. But the ability to cancel would be a good start.

rotty3000 commented 4 years ago

I feel that a problem of resolving these days is originating from p2 repositories. I rarely hear about resolve issues unless it's with P2.

However, I don't believe the issue is with the indexation or even the magnitude necessarily. My gut tells me that there is some logical construct within the bundles or with some characteristic of the bundles generally found in P2 repositories which causes much more stress on the resolver than elsewhere.

We've been resolving lately against some pretty massive repos and the cost seems so insignificant that it barely registers.

Perhaps it's related to Require-Bundle and maybe some stress this places on the class-space calculations the resolver has to compute?

kriegfrj commented 4 years ago

Interesting thoughts, @rotty3000 - I've just hit what looks like a resolver hang when working in the Bnd workspace, which doesn't have any p2 repos. I am, however, trying to resolve an Eclipse launch (which has lots of bundles with Require-Bundle), so maybe it is related to Require-Bundle? On the other hand, I've also had Eclipse launches resolve in a reasonable timeframe too, so perhaps there's something else going on.

Whatever is going on with these long resolutions, however, it will be difficult to trace unless we can get some kind of progress output from the resolver. I'm sure if we had that progress information then the root cause would become immediately obvious in many cases.

rotty3000 commented 4 years ago

FYI, I didn't mean the P2 repositorty implementations in bnd (which from resolver perspective is irrelevant since all it sees are resources). Rather I meant the characterics of bundles one commonly finds in p2 repositories.

juergen-albert commented 4 years ago

I've implemented the cancel and while doing so, I've realized that the resolver writes a log. I've added an option to keep the log file after the dialog is closed. Right now I can't make heads or tails out of the log, but I hope to get some analysis running on failed attempts, so we can get a clue what the problem is.

@rotty3000 I support your suspicion. Bundles sourced in P2 Repo are proportionally often a cause of the issue. From my point of view, this may have two reasons: 1. bad metadata in bundles due to PDE and 2. badly orchestrated repos with multiple versions of the same bundle,

We previously had such issues mostly when we had 2 many bundles that provide similar packages/capabilities. especially one bundle in multiple versions result in situations, where it does not finishes the resolve.

While we are at it, I saw another result that might be part of the issue. Simplified it looks as follows: Initial Requirement: A Bundle with a Package requirement for an API and a Implementing service

Api and Impl Bundles are available in 2 Versions in the Repos, both Versions of the API would be potential matches.

Result:

The Initial Bundle
The API Bundle in the latest Version
The Impl Bundle in the latest Version and its dependencies
The Impl Bundle of the older Version and its dependencies but not the older API Bundle.

Thus it appears that the Resolver considers both API bundles and calculates their dependency tree. Then it throws away the older Version, but keeps all or a lot of the obsolete dependencies from it in the result.

rotty3000 commented 4 years ago

So, I was testing a theory (which also caused me to find this bug https://github.com/bndtools/bnd/issues/4158) and so (after fixing the bug) on the https://github.com/osgi/osgi-test repo I added our company's huge BOMs to the parent pom's dependency management section:

            <dependency>
                <groupId>com.liferay.portal</groupId>
                <artifactId>release.dxp.bom</artifactId>
                <version>7.2.10.fp5</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
            <dependency>
                <groupId>com.liferay.portal</groupId>
                <artifactId>release.dxp.bom.third.party</artifactId>
                <version>7.2.10.fp5</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
            <dependency>
                <groupId>com.liferay.portal</groupId>
                <artifactId>release.dxp.bom.compile.only</artifactId>
                <version>7.2.10.fp5</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
            <dependency>
                <groupId>org.objenesis</groupId>
                <artifactId>objenesis</artifactId>
                <version>2.6</version>
                <scope>test</scope>
            </dependency>

This combination causes the implicit repository to contain 1350 bundles.

Now before adding this the osgi-test build runs in:

[INFO] Total time:  42.098 s

After adding these the build will:

index two times
resolve two times the build still completes in:
```
[INFO] Total time:  01:00 min
```
which if you slit that roughly in half for both invocations is 9 seconds. and if you split that again in half rougly 50% for indexing and 50% for resolving that's about only 4.5 seconds to resolve or there about. So it really makes me wonder why does it take so long for those p2 repositories where even if there is 10 times more bundles that it should take so long that you have to force quit the resolve.

kriegfrj commented 4 years ago

Ray, just curious - your repo is huge, but just following Juergen's theory for a sec - does it contain many bundles that export the same packages?

rotty3000 commented 4 years ago

No, and that could certainly be one of the characteristics causing pressure on the resolver. So the candidate causes (might even be a combination) to look into might be:

multiple providers of same capability (e.g. packages)
Required-Bundle class-space calculation

juergen-albert commented 4 years ago

I currently have a repository, where I can switch it on and off by blacklisting a bundle. I will try to get usefully statistics out if it.

@rotty3000 @kriegfrj I can Grant you access as well if you want to pitch in as well.

kriegfrj commented 4 years ago

Interesting - now that you mention it, when I've had this problem before it was because i had two different versions of the junit 5 bundles (latest set from Maven Central and also the set from Eclipse 2018-12), and blacklisting one set fixed it. Unfortunately I'm not in a position just now to verify if I'm hitting ther same issue now.

juergen-albert commented 4 years ago

@rotty3000 @kriegfrj I've created another issue for the resolver issue #4172

bndtools / bnd

[bndtools] support cancel on resolve operation #4154