bndtools / bnd

Bnd/Bndtools. Tooling to build OSGi bundles including Eclipse, Maven, and Gradle plugins.
https://bndtools.org
Other
530 stars 305 forks source link

[Feature] Smarter builder dependency checking #3175

Closed kriegfrj closed 1 year ago

kriegfrj commented 5 years ago

While working on #3145, I noticed that the tester bundle (and the biz.aQute.junit.bundle.engine bundle I created, on which it depends) is quite a way down the build dependency hierarchy, and I was spending quite a bit of time waiting for Bndtools to rebuild a large number of bundles in the workspace. It seems that relatively small changes to the bundle.engine bundle result in a rebuild of all the dependent bundles. As an extreme example, merely changing a comment in a class file in a bundle causes all of the dependent bundles to rebuilt.

Though I haven't looked at it in detail, I think that the build dependency checking seems to depend on the timestamp of the individual bundles' jar files - if the jar file changes, then any other bundle in the workspace that has that bundle on its build path will get rebuilt.

This is an extremely coarse and conservative approach, which virtually guarantees you won't get any out-of-date bundles higher up the build hierarchy. However, it also means that Bndtools spends quite a bit of time unnecessarily rebuilding bundles that really don't need to be rebuilt. A bundle really only needs to be rebuilt if a dependent bundle's ''API'' changes in a way that is not backward compatible. Changes that fall in this category:

  1. Changing a method signature.
  2. Removing a method.
  3. Removing a class (arguably, only when an exported class is removed, but given that dependent bundles can violate the export provisions at build time at least, the most conservative way is to rebuild when any class is removed).
  4. Removing an exported package.

This is one area (perhaps the first, in my experience) where I think PDE still has a bit of an edge over Bndtools - as Eclipse doesn't base its build decisions on fully assembled bundles, but on the classes that they contain, and it uses its regular inter-project dependency mechanisms to make decisions about when to rebuild a class.

I know that Bnd already has some great built-in capabilities for calculating external imports and baselining. It occurred to me that a combination of these two capabilities that in theory could combine to form the basis of smarter dependency/rebuilding decisions. You keep track of which classes you depend on from each bundle (a capability used by the automatic package import calculation), and you keep track of incompatible changes in classes from each bundle (a capability used by the baselining), and you only rebuild if there is an incompatible change in a dependent class.

And while #3032 is strictly speaking a different feature request to this one, I think if this feature were implemented it might also indirectly solve #3032 for my specific use case (as most of the downstream dependencies wouldn't get rebuilt by a builder with smarter dependency checking).

I also wonder if this functionality could be pushed lower, into Bnd, so that the Gradle and Maven plugins can take advantage of it for their incremental building too. But I don't know a lot about how these operate.

Thoughts?

rotty3000 commented 5 years ago

It's not just the API, but really the manifest. And BND just builds the manifest when it builds the jar.

On Wed, May 8, 2019, 03:18 Fr Jeremy Krieg, notifications@github.com wrote:

While working on #3145 https://github.com/bndtools/bnd/pull/3145, I noticed that the tester bundle (and the biz.aQute.junit.bundle.engine bundle I created, on which it depends) is quite a way down the build dependency hierarchy, and I was spending quite a bit of time waiting for Bndtools to rebuild a large number of bundles in the workspace. It seems that relatively small changes to the bundle.engine bundle result in a rebuild of all the dependent bundles. As an extreme example, merely changing a comment in a class file in a bundle causes all of the dependent bundles to rebuilt.

Though I haven't looked at it in detail, I think that the build dependency checking seems to depend on the timestamp of the individual bundles' jar files - if the jar file changes, then any other bundle in the workspace that has that bundle on its build path will get rebuilt.

This is an extremely coarse and conservative approach, which virtually guarantees you won't get any out-of-date bundles higher up the build hierarchy. However, it also means that Bndtools spends quite a bit of time unnecessarily rebuilding bundles that really don't need to be rebuilt. A bundle really only needs to be rebuilt if a dependent bundle's ''API'' changes in a way that is not backward compatible. Changes that fall in this category:

  1. Changing a method signature.
  2. Removing a method.
  3. Removing a class (arguably, only when an exported class is removed, but given that dependent bundles can violate the export provisions at build time at least, the most conservative way is to rebuild when any class is removed).
  4. Removing an exported package.

This is one area (perhaps the first, in my experience) where I think PDE still has a bit of an edge over Bndtools - as Eclipse doesn't base its build decisions on fully assembled bundles, but on the classes that they contain, and it uses its regular inter-project dependency mechanisms to make decisions about when to rebuild a class.

I know that Bnd already has some great built-in capabilities for calculating external imports and baselining. It occurred to me that a combination of these two capabilities that in theory could combine to form the basis of smarter dependency/rebuilding decisions. You keep track of which classes you depend on from each bundle (a capability used by the automatic package import calculation), and you keep track of incompatible changes in classes from each bundle (a capability used by the baselining), and you only rebuild if there is an incompatible change in a dependent class.

And while #3032 https://github.com/bndtools/bnd/issues/3032 is strictly speaking a different feature request to this one, I think if this feature were implemented it might also indirectly solve #3032 https://github.com/bndtools/bnd/issues/3032 for my specific use case (as most of the downstream dependencies wouldn't get rebuilt by a builder with smarter dependency checking).

I also wonder if this functionality could be pushed lower, into Bnd, so that the Gradle and Maven plugins can take advantage of it for their incremental building too. But I don't know a lot about how these operate.

Thoughts?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bndtools/bnd/issues/3175, or mute the thread https://github.com/notifications/unsubscribe-auth/AABD2TC5K2SKK4J444ZVYMDPUJ5FRANCNFSM4HLPBNHQ .

bjhargrave commented 5 years ago

This is one area (perhaps the first, in my experience) where I think PDE still has a bit of an edge over Bndtools - as Eclipse doesn't base its build decisions on fully assembled bundles, but on the classes that they contain, and it uses its regular inter-project dependency mechanisms to make decisions about when to rebuild a class.

PDE is different here since, in Bnd, a project's generated bundle can contain more (or less) classes/packages than the source of the project. So from a correctness point of view, the source code of a project depends upon the generated jars of dependent projects and not the source code (compiled into classes) of dependent projects.

Bndtools could perhaps short circuit some dependencies by a deeper understanding of the package import/export graph. That is, if a project does not import a package from a dependent project, then it does not matter. But then why is the dependent project on the -buildpath in that case?

I think we are talking fairly complex and possibly fragile optimizations to the build here. It also would break fidelity with the gradle build which is based upon jar level project dependencies.

kriegfrj commented 5 years ago

@rotty3000:

It's not just the API, but really the manifest. And BND just builds the manifest when it builds the jar.

Yes. Any advanced dependency checking would need to be based on the final contents of the assembled jar, including the manifest.

@bjhargrave:

PDE is different here since, in Bnd, a project's generated bundle can contain more (or less) classes/packages than the source of the project. So from a correctness point of view, the source code of a project depends upon the generated jars of dependent projects and not the source code (compiled into classes) of dependent projects.

An important point, but I don't think it makes a difference in substance to the thrust of this idea. It simply means that instead of examining what source code or class files have changed, you need can examine which classes have changed in the final assembled jar. You can also examine the manifest (per @rotty3000 's comment).

Bndtools could perhaps short circuit some dependencies by a deeper understanding of the package import/export graph. That is, if a project does not import a package from a dependent project, then it does not matter. But then why is the dependent project on the -buildpath in that case?

In answer to the last question: I think I've hit instances (in particular working with Apache CXF) where it was necessary to include a transitive dependency on -buildpath that my project did not directly link to before it would compile. From memory, it was because the transitive dependency helped define the supertype hierarchy of the class that I directly depended on from my primary dependency. It was annoying, because the Quick Fix processor (that suggests bundles to add to -buildpath) did not know to add this bundle to the build path, and it took a fair bit of digging to figure out exactly which dependency I needed to add. It is possible that the existence of such cases may even further complicate more advanced dependency checking, but if we could solve that case then as a bonus the Quick Fix processor could be a bit more informed about suggesting dependencies to add to the -buildpath.

Anyway, putting that aside for a moment, your proposition would be a first step.

An simple improvement on this would be to assign a timestamp to each package (= latest timestamp of all classes in the package), rather than relying on the single timestamp of the whole bundle. Then dependent bundles can then use the package's timestamp, rather than the bundle's timestamp, to determine if it needs to rebuild.

A further improvement would be to use the baselining ability of Bnd to keep track of whether there have been any API changes in each package since the last build. Then you only need to rebuild when there has been an API change.

The gold standard would be to then change this from package-level granularity to class-level granularity. This would be much more complicated, and possibly not worth the effort, but I think that the 80% gain lies in the above.

I think we are talking fairly complex and possibly fragile optimizations to the build here.

Possibly. I would anticipate that any resulting risk could be mitigated by adding it as an "incubating/experimental" feature which is off by default. If/when it becomes more stable, you could change it to be on by default.

It also would break fidelity with the gradle build which is based upon jar level project dependencies.

This could be managed in three ways:

  1. With the Bndtools on/off switch, you simply ensure that the smart dependencies feature is off fidelity with Gradle is important (and you are willing to sacrifice the build time to achieve it); or
  2. Porting the same advanced dependency checking mechanism into the Gradle plugin.
  3. Using 1. as a transition until 2. is working satisfactorily.

My (admittedly very brief) research into this indicates that Gradle supports more complex "up-to-date" checking than simply depending on whether a file has changed (see https://docs.gradle.org/current/userguide/more_about_tasks.html#sec:up_to_date_checks). So I think, in theory, it would be possible to port whatever enhancements are made to incremental building into the Gradle plugin too, and maintain build fidelity that way.

Note that this fidelity only becomes an issue with incremental builds. For clean builds from scratch, it isn't an issue.

Based on my own experience, I have a feeling that it could result in massive reductions in build times, which might make the complexity worth it.

bjhargrave commented 5 years ago

At this point, this issue is just a "wish" without anything concrete behind it. Please make some specific suggestions for changes that can be analyzed and discussed.

where I think PDE still has a bit of an edge over Bndtools - as Eclipse doesn't base its build decisions on fully assembled bundles, but on the classes that they contain,

I am not sure I can see much daylight between "fully assembled bundles" and "the classes that they contain" other than non-class resources in the bundles. But changes to non-class resources of a bundle are much less common in an incremental-development-in-Bndtools scenario. I am sure your experience was in editing code.

and it uses its regular inter-project dependency mechanisms to make decisions about when to rebuild a class.

What is a "regular inter-project dependency mechanism"?

kriegfrj commented 4 years ago

Following #3833, I'm wondering if we can potentially leverage the bloom filter implementation to support the ideas in this thread.

As I understand it, @pkriens added the class names' hash codes as a List attribute on Export-Package, for each package. If we did the same for Import-Package, then we have the beginnings of something that could possibly be used for class-level dependency checking.

I haven't pondered this idea in-depth, but I wanted to capture it before it gets forgotten in case someone else wanted to take it and run with it.

pkriens commented 4 years ago

I think the interesting solution would be to create a checksum on the public API of a bundle. This would be easy to do and would be easy to extend with different things we consider public. It would not be on class level, but I've serious doubts that can work reliable except when classes are directly visible, and not, like in our case, have a build step.

A public API must also take into account build artefacts like included resources. For compile, we could ignore this, but testing and launching it is public API.

stale[bot] commented 4 years ago

This issue has been automatically closed due to inactivity. If you can reproduce this on a recent version of Bnd/Bndtools or if you have a good use case for this feature, please feel free to reopen the issue with steps to reproduce, a quick explanation of your use case or a high-quality pull request.

kriegfrj commented 3 years ago

At the moment I'm working on bndlib while testing in bndtools.core.test. As bndlib is right at the bottom of dependency chain, this is getting quite painful rebuilding most of the entire workspace for every tiny change.

pkriens commented 3 years ago

The problem is that bndlib originates 20 years ago with btool. I never imagined it to become this large.

Nowadays, all my workspaces use an API project and there are no dependencies between projects. This makes the build very fast and is the far superior design technique since it keeps the transitive dependencies at heel. Stupidly, these core ideas I already developed in the 1980s; initially I wanted the instruction language to be its API :-(

That said, I found two solutions to this problem.

Maybe it is time we start thinking about a release 6 where we introduce an API project and break bndlib into smaller projects that communicate strictly via API. We should maybe also think about using the bndrun model for creating bnd, the gradle plugin, and other projects that consume the whole code base and are thus at the top of the dependency chain. If we use the -export for them, they would be easy to remove, or delay, from the IDE global build cycle. And these leaves are generally not need when you are working in a bottleneck module.

However, doubt anybody has the stamina for this.

kriegfrj commented 2 years ago

Maybe it is time we start thinking about a release 6 where we introduce an API project and break bndlib into smaller projects that communicate strictly via API. We should maybe also think about using the bndrun model for creating bnd, the gradle plugin, and other projects that consume the whole code base and are thus at the top of the dependency chain. If we use the -export for them, they would be easy to remove, or delay, from the IDE global build cycle. And these leaves are generally not need when you are working in a bottleneck module.

I think that these are worthy of consideration and in particular the "API project" refactoring has utility in its own right. However, as solutions to this particular problem I feel that they are band-aids. In particular, even if properly structured with an API project, changing one comment in the API project will cause all the dependent projects to be rebuilt.

I've been doing a bit of research on this to understand how PDE achieves what it achieves in terms of this performance issue, and how we might do the same.

Here is the key point of difference in JDT's IncrementalImageBuilder that explains the behaviour:

if (bLocation instanceof ClasspathJar) {
    if (JavaBuilder.DEBUG)
        System.out.println("ABORTING incremental build... found delta to jar/zip file"); //$NON-NLS-1$
        return false; // do full build since jar file was changed (added/removed were caught as classpath change)
    }
}

This is in the code that looks for changes in the prerequisites. If one of the changed prerequisites is a jar/zip file, then it aborts the incremental build and always does a full rebuild of the project.

BndContainerInitializer is responsible for populating the Bnd Bundle Path classpath container in a Bndtools project. When a bnd project is in -buildpath/-testpath, then it creates a classpath entry of JDT type CPE_PROJECT that refers to the corresponding Eclipse project. However, if the -buildpath/-testpath dependency does not have the "version=project" attribute, then it will also create a classpath entry of type CPE_LIBRARY (pointing to the output jar in generated):

https://github.com/bndtools/bnd/blob/3e245e070fe954965eed8546da7e8819b5f30177/bndtools.builder/src/org/bndtools/builder/classpath/BndContainerInitializer.java#L380-L398

So the problem is that having the CPE_LIBRARY entry on the build path is causing the JavaBuilder to do a full rebuild each time the jar is updated. We could get around this problem if we could figure out a way to remove it.

As @bjhargrave explained here:

  1. We add the CPE_PROJECT entry because this makes the source lookup work properly.
  2. We add the CPE_LIBRARY entry because the final jar:
    1. may not contain all of the classes that are in the project;
    2. may contain classes that are not in the project (courtesy of Bnd features like -includepackage and -conditionalpackage).

But having both of these in the container has also caused other issues. Not only does it appear to be the root cause of this issue, it also causes duplicate search results (ie, #4008).

I'm going to investigate to see if there's a way we could do away with the CPE_LIBRARY entry altogether while still accounting for the fact that the project's final jar may include more classes than what the project contains. That would solve any of those lingering search issues as well as solving this dependency problem.

kriegfrj commented 2 years ago
  1. We add the CPE_PROJECT entry because this makes the source lookup work properly.
  2. We add the CPE_LIBRARY entry because the final jar:

    1. may not contain all of the classes that are in the project;
    2. may contain classes that are not in the project (courtesy of Bnd features like -includepackage and -conditionalpackage).

But having both of these in the container has also caused other issues. Not only does it appear to be the root cause of this issue, it also causes duplicate search results (ie, #4008).

I'm going to investigate to see if there's a way we could do away with the CPE_LIBRARY entry altogether while still accounting for the fact that the project's final jar may include more classes than what the project contains. That would solve any of those lingering search issues as well as solving this dependency problem.

I think I have found the missing pieces - the above should be possible.

From the doc for IClasspathEntry:

Note: referencing a required project with a classpath entry refers to the source code or associated .class files located in its output location. It will also automatically include any other libraries or projects that the required project's classpath refers to, iff the corresponding classpath entries are tagged as being exported (IClasspathEntry.isExported()). Unless exporting some classpath entries, classpaths are not chained by default - each project must specify its own classpath in its entirety.

So the basic plan would be:

  1. Mark all entries in the Bnd Bundle Path container as "exported". This will make the classes in all upstream dependencies of the project also available to the downstream dependencies.
  2. Use access rules to restrict this full set of classes down to the actual set of classes, corresponding to the rules that Bnd uses to build the final jar:
    1. where a class is included and it is in an exported package, then no access rule is applied;
    2. where a class is included but it is not in an exported package, then the access rule of "discouraged" is used;
    3. where a class is not included, then the access rule of "non accessible".

As an added bonus: If you're debugging and you end up in a class that has been included by (eg) -conditionalpackage from another project in your workspace, then chaining the dependencies in this way will (I believe) make the source lookup take you to the original (editable) source. I believe in the current setup, it will instead take you to the embedded, non-editable source (assuming that it's available!).

Possible pitfalls:

  1. Because bnd allows for the inclusion/exclusion of individual classes in a package, for 100% fidelity the IAccessRules need to be implemented per-class and not per-package. This will mean an IAccessRule for each class on the classpath (or at the very least, for every non-exported class). I'm not sure how well that will scale.
  2. If a class is in an upstream dependency but has been excluded, it will give you the error of "not accessible" rather than "not found". I'm not sure how much of an issue this would be.
bjhargrave commented 2 years ago
  1. Mark all entries in the Bnd Bundle Path container as "exported". This will make the classes in all upstream dependencies of the project also available to the downstream dependencies.

This is wrong. This is transitive dependencies all the way down. Bndtools is trying to emulate the OSGi class loading model in a flat classpace of the Eclipse classpath container to provide best-effort fidelity between the compile environment and the OSGi runtime environment. If all dependencies of a project are exported, then users of that project will have visibility to things they cannot see at runtime which breaks fidelity.

2. Use access rules to restrict this full set of classes down to the actual set of classes, corresponding to the rules that Bnd uses to build the final jar:

This could mitigate the transitivity issue but you may need to build the bundle to know the how to program the access rules for users of the bundle. So you have a circularity issue. When you build a project, you will have to update the classpath containers of all projects which depend upon the just built project which can then trigger their builds. So you may not really save anything.

But this does sound interesting if you can demonstrate better performance which similar fidelity. Good luck!

kriegfrj commented 2 years ago
  1. Mark all entries in the Bnd Bundle Path container as "exported". This will make the classes in all upstream dependencies of the project also available to the downstream dependencies.

This is wrong. This is transitive dependencies all the way down. Bndtools is trying to emulate the OSGi class loading model in a flat classpace of the Eclipse classpath container to provide best-effort fidelity between the compile environment and the OSGi runtime environment. If all dependencies of a project are exported, then users of that project will have visibility to things they cannot see at runtime which breaks fidelity.

By itself it would be wrong and break fidelity, yes - but I think the second point will restore it.

  1. Use access rules to restrict this full set of classes down to the actual set of classes, corresponding to the rules that Bnd uses to build the final jar:

This could mitigate the transitivity issue but you may need to build the bundle to know the how to program the access rules for users of the bundle. So you have a circularity issue. When you build a project, you will have to update the classpath containers of all projects which depend upon the just built project which can then trigger their builds. So you may not really save anything.

True, we will probably need to build the bundle to determine the access rules. But that won't necessarily trigger a rebuild because the downstream project won't depend directly on the output jar anymore. It will be up to the classpath container to signal if its access rules have changed (which are a proxy indication that the contents of the bundle have changed).

But this does sound interesting if you can demonstrate better performance which similar fidelity. Good luck!

We'll see how we go!

kriegfrj commented 2 years ago

I've put a bit of a quick hack together as an experiment to see if this will work.

The early results are promising, however they do highlight some additional minor design decisions that may need to be made. Recording what I have learned so far here.

The approach centers around when and how JavaCore.newProjectEntry() and JavaCore.newLibraryEntry() are called. Here is how it works with reference to the original basic plan I sketched out:

So the basic plan would be:

  1. Mark all entries in the Bnd Bundle Path container as "exported". This will make the classes in all upstream dependencies of the project also available to the downstream dependencies.

This is achieved simply by passing true as the value of the exported flag to both newProjectEntry() and newLibraryEntry()

The fidelity issue that @bjhargrave was concerned about is circumvented by the next point.

  1. Use access rules to restrict this full set of classes down to the actual set of classes, corresponding to the rules that Bnd uses to build the final jar:

This is achieved by simply passing true as the value of the combine access rules flag to newProjectEntry(). To use the bndtools.test/bndws workspace (which already has a ready-made transitive dependency example included for reference) to illustrate how this works:

This works as expected with the BndContainerInitializer out-of-the-box.

With my experimental changes:

As an added bonus: If you're debugging and you end up in a class that has been included by (eg) -conditionalpackage from another project in your workspace, then chaining the dependencies in this way will (I believe) make the source lookup take you to the original (editable) source. I believe in the current setup, it will instead take you to the embedded, non-editable source (assuming that it's available!).

I have confirmed that this also works.

As an additional added bonus, this gets rid of the duplicate results in all of Eclipse's Java search functionality - it will always take you to the editable source file, and the jar file result will no longer appear (as the jar file is not on any build path).

I also wanted to double-check this:

  1. Because bnd allows for the inclusion/exclusion of individual classes in a package

Is this actually true? I've been reflecting on Bnd and I think it really only allows you to include/exclude packages as whole units, not classes. In which case, this will not be an issue and the access rules can easily be configured for package granularity.

One issue to be considered though is the handling of private packages. These could be marked as "discouraged" rather than "not accessible". But how do you find them? You could use the private-package manifest header if it's there - otherwise, you'd have to do it by scanning the jar file. Or do we simply rely on the private-package header and if any private packages are included but there is no private-packages header, they simply will not be accessible? That would be a breaking change though.

  1. If a class is in an upstream dependency but has been excluded, it will give you the error of "not accessible" rather than "not found". I'm not sure how much of an issue this would be.

I have verified that this will indeed happen. I'm still not sure how much of an issue this would be - I think it is minor.

rotty3000 commented 2 years ago
   Because bnd allows for the inclusion/exclusion of individual classes in a package

Is this actually true? I've been reflecting on Bnd and I think it really only allows you to include/exclude packages as whole units, not classes. In which case, this will not be an issue and the access rules can easily be configured for package granularity.

You probably want to have a look at Class Filtering: http://docs.osgi.org/specification/osgi.core/7.0.0/framework.module.html#i3106983

kriegfrj commented 2 years ago
   Because bnd allows for the inclusion/exclusion of individual classes in a package

Is this actually true? I've been reflecting on Bnd and I think it really only allows you to include/exclude packages as whole units, not classes. In which case, this will not be an issue and the access rules can easily be configured for package granularity.

You probably want to have a look at Class Filtering: http://docs.osgi.org/specification/osgi.core/7.0.0/framework.module.html#i3106983

That certainly complicates things... but I'm not sure that Bndtools supports that level of fidelity at the moment anyway. My reading of the code is that the access rules are all package-level granularity.

kriegfrj commented 2 years ago

Bit of an update: this was looking very promising, but I found what I think is a bug in Eclipse which I have reported here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=578452 In summary, if a project or library is a transitive dependency via two distinct dependency chains (with different visibility rules), then only the visibility rules of the first chain will apply.

I think I might have a workaround - which basically means not using "exported" but manually emulating the same behaviour, which means that the buggy implementation of computeExpandedClass() is bypassed.

kriegfrj commented 2 years ago

At this point, this issue is just a "wish" without anything concrete behind it. Please make some specific suggestions for changes that can be analyzed and discussed.

where I think PDE still has a bit of an edge over Bndtools - as Eclipse doesn't base its build decisions on fully assembled bundles, but on the classes that they contain,

I am not sure I can see much daylight between "fully assembled bundles" and "the classes that they contain" other than non-class resources in the bundles. But changes to non-class resources of a bundle are much less common in an incremental-development-in-Bndtools scenario. I am sure your experience was in editing code.

What I mean is the difference between classes in a folder hierarchy on-disk, vs those same classes contained in a jar.

and it uses its regular inter-project dependency mechanisms to make decisions about when to rebuild a class.

What is a "regular inter-project dependency mechanism"?

Having had a recent deep-dive into JDT, I am better placed to answer this particular question, which I thought I would do for posterity.

The Eclipse JavaBuilder has a smart incremental builder. As it builds, it maintains two pieces of useful state:

  1. an answer to the question "has there been a structural change to this class?" (ie, a change that would require dependent classes to be rebuilt).
  2. a full dependency graph keeping track of which class is dependent on which.

When Java source files change in JDT, then the JavaBuilder recompiles them. It then analyzes the generated class files to see if there have been any structural changes to any of them. If there haven't been any (this is common), then it is done. Otherwise, it finds all dependent classes and adds them to the compilation queue. Wash, rinse, repeat. In many cases, it won't have to go further than the first round. In very many cases, it won't have to go further than the second. This makes pure Java compilation very quick. This is the "regular inter-project dependency mechanism". (A more detailed description of this can be found in the sidebar of this article.)

For jars on the classpath, however, the JavaBuilder's dependency checking is not so fine-grained. If a Jar on the classpath of a project is updated (based on a simple timestamp check), then it will always rebuild all classes in the project (which, in turn, causes BndtoolsBuilder to update the generated jar). Because Bndtools always puts the generated jar on the classpath of dependent projects (as well as the project), then this coarse method of dependency checking is used which circumvents the regular inter-project dependency mechanism of the JavaBuilder. This then ripples all the way to the bottom of the dependency chain.

This is why removing the duplicated jar from the build path ought to make a massive difference to the performance of Bndtools.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. Given the limited bandwidth of the team, it will be automatically closed if no further activity occurs. If you feel this is something you could contribute, please have a look at our Contributor Guide. Thank you for your contribution.

github-actions[bot] commented 1 year ago

This issue has been automatically closed due to inactivity. If you can reproduce this on a recent version of Bnd/Bndtools or if you have a good use case for this feature, please feel free to reopen the issue with steps to reproduce, a quick explanation of your use case or a high-quality pull request.

chrisrueger commented 11 months ago

The Eclipse JavaBuilder has a smart incremental builder. As it builds, it maintains two pieces of useful state:

  1. an answer to the question "has there been a structural change to this class?" (ie, a change that would require dependent classes to be rebuilt).
  2. a full dependency graph keeping track of which class is dependent on which.

Since we now have PDE - @laeubi on board :) maybe @kriegfrj and @pkriens could reopen this issue and start a new discussion about it? I spoke with @juergen-albert yesterday about it. An improvement here would really increase developer productivity (and developer happines) in orders of magnitude, when you were used to how fast this has been in PDE.

laeubi commented 11 months ago

@chrisrueger I actually started some proof-of-concept at Tycho to use the Analyzer on the source files, I probably should clean this up a bit and add as part of BND itself, while this might require some changes on the long run, it will offer a lot potential, e.g. just think about one would not create a new analyzer each time, but having an IncrementalAnalyzer that could be feed with updates, e.g. if I change the javadoc only it obviously will not impact anything in the build jar (except for sources), so a trivial update would be to just replace that single java file if sources are included and to nothing else.

The same can even go further down the chain, just assume I have a component and add a new @Reference to a method it would be enough to just replace this single component.xml and so on...

chrisrueger commented 11 months ago

@laeubi Wow this sounds great.

Usually our pain is in bundles close to the root were we add small things (Javadoc, an if-else check inside a private method without changing the method signature) etc. An accidental Ctrl+S could mean another coffee break ;), because the whole bundle-chain is rebuilt.

I know that some of these topics are because don't always do it the proper OSGi way™. But that's how life is sometimes.

One term @juergen-albert and I "invented" yesterday was a "lax-mode" / "lenient-mode" which could be enabled optionally. This mode could mean "build my stuff in a fast way. I accept that things can go wrong and I should rebuilt with normal mode once I'm done to see the thruth.".

kriegfrj commented 11 months ago

@laeubi Wow this sounds great.

Usually our pain is in bundles close to the root were we add small things (Javadoc, an if-else check inside a private method without changing the method signature) etc. An accidental Ctrl+S could mean another coffee break ;), because the whole bundle-chain is rebuilt.

I know that some of these topics are because don't always do it the proper OSGi way™. But that's how life is sometimes.

I don't think that "that's not the proper OSGi way" is a full explanation of this problem.

The fixes I was working on in this issue/PR would have fixed that javadoc problem. Unfortunately, I hit a couple of Eclipse internal bugs and some missing internal Bnd features that slowed me down and I ultimately ran out of steam and had to go work on other things. But I could try and revisit this now - especially as @pkriens has implemented #5618 since I last looked at this.

A big part of the problem, as I discovered it, is the way Eclipse's incremental JavaBuilder deals with out-of-date dependencies:

This coarse dependency checking for jars makes more sense for Eclipse's normal way of doing things - normally jars in your workspace are third-party dependencies that don't change much.

The complication comes from two features of Bnd/tools:

  1. The core goal of "build fidelity" (so that the contents of a project as seen by downstream projects ought to exactly replicate replicate what would be seen by a finished jar);
  2. The fact that a bnd project can build bundles that include classes that are copied from dependencies in the build path, and not only from classes in the project (eg, using -conditionalpackage or -includepackage.

You'll notice if you expand the Bnd Bundle Path container in a project that depends on other projects in your workspace, it contains a jar entry and a project entry for each -buildpath entry that references a project in your workspace (dependencies that come from another repository in the workspace have only a jar). Adding the jar was necessary because of this second feature - the jar will sometimes contain classes that are not in the project itself. However, as noted above, jars as dependencies have coarse dependency checking behaviour, and because the jars are used through the entire dependency hierarchy, this causes the ripple-through effect - aka the "Ctrl-S=coffee break" phenomenon.

@pkriens has a philosophy of separating API bundles from implementation bundles. This helps to mitigate against the coffee break phenomenon substantially, because substantially flattens the dependency hierarchy as projects typically only depend on the API projects which don't change frequently. However, this is still not perfect - as you note, trying to tidy-up Javadoc in a well-used API project will result in lots of unnecessary rebuilding. While I think that there could be some good (independent) reasons for doing it the way that @pkriens suggests, it seems to me a substantial driver of this approach was/is this particular deficiency of Bndtools. And even for workspaces that are structured the "proper" way, implementing this feature would help.

I have some spare cycles - I'm happy to revisit this.

One term @juergen-albert and I "invented" yesterday was a "lax-mode" / "lenient-mode" which could be enabled optionally. This mode could mean "build my stuff in a fast way. I accept that things can go wrong and I should rebuilt with normal mode once I'm done to see the thruth.".

For me, if I'm making Javadoc changes up the hierarchy, I simply turn off the "Build Automatically" flag until I'm done. After all, "not building" could be considered a special case of "lax mode" (it's harder to be more lax than not building at all! :smile:).

kriegfrj commented 11 months ago

@chrisrueger I actually started some proof-of-concept at Tycho to use the Analyzer on the source files, I probably should clean this up a bit and add as part of BND itself, while this might require some changes on the long run, it will offer a lot potential, e.g. just think about one would not create a new analyzer each time, but having an IncrementalAnalyzer that could be feed with updates, e.g. if I change the javadoc only it obviously will not impact anything in the build jar (except for sources), so a trivial update would be to just replace that single java file if sources are included and to nothing else.

The same can even go further down the chain, just assume I have a component and add a new @Reference to a method it would be enough to just replace this single component.xml and so on...

I think these things would help, but the performance of the Analyzer is not the main issue here (from my analysis at least). The main issue (described in detail in the previous post) is from the interaction between the JavaBuilder and the Bndtools Builder, and the way that the Bnd Build Path dependency container works. (PDE does not have this problem, because it does not add the jars of the upstream projects as dependencies - only the Eclipse project.)

Also, in my previous post, I forgot to mention the actual solution: The solution is to add the transitive dependencies to the build path for any classes that are included via -includepackage or -conditionalpackage, together with the necessary access restriction rules to maintain build fidelity. This is why @pkriens' change in #5618 will help - because this information will be present in the output from the Analyzer. Once you can do this, then you no longer have to add the project's output jar to the the dependency container. As a bonus, this will have the side-effect of fixing a number of other problems that are caused by having both the project and jar as dependencies (eg, duplicate Java search results).

laeubi commented 11 months ago

@kriegfrj I'm not sure if it might be an option for BNDtools as well, but for PDE I created a ProjectJar that acts as a proxy between BNDlib and the java output folder, that way not a "real" jar is created but everything is flatten into the java output folder (including resources and/or classes included).

About the transitive dependencies approach, this is how it is handled in m2e as well because maven has some special ordering rules.

kriegfrj commented 11 months ago

@kriegfrj I'm not sure if it might be an option for BNDtools as well, but for PDE I created a ProjectJar that acts as a proxy between BNDlib and the java output folder, that way not a "real" jar is created but everything is flatten into the java output folder (including resources and/or classes included).

That could be an option too - thank you for the suggestion. I recall toying with this idea when I played with this two years ago - although my memory has faded a bit since then I think I was trying to lean toward the "transitive dependencies" approach (even though it was more difficult) because it had some additional benefits when it came to source code navigation - ie, when you use JDT's source code navigation with this approach, it takes you directly to the appropriate upstream project source. I think with the "exploded jar" approach it would take you to a decompiled (and not editable) version of the class file in the exploded jar directory instead, which is helpful but not as helpful as going directly to the source.

About the transitive dependencies approach, this is how it is handled in m2e as well because maven has some special ordering rules.

That's good to know it's a tried-and-tested approach!

Thank you so much for lending your knowledge and expertise to this problem. It's great to have you around.

chrisrueger commented 11 months ago

Thanks @kriegfrj and @laeubi for your detailed analysis and proposals.

Let me try to collect a few sentences which I think summarize the problem quite well:

If your dependency, on the other hand, is a jar, then the JavaBuilder's dependency checking mechanism is more coarse

However, as noted above, jars as dependencies have coarse dependency checking behaviour, and because the jars are used through the entire dependency hierarchy, this causes the ripple-through effect - aka the "Ctrl-S=coffee break" phenomenon.

this is still not perfect - as you note, trying to tidy-up Javadoc in a well-used API project will result in lots of unnecessary rebuilding.

separating API bundles from implementation bundles. This helps to mitigate against the coffee break phenomenon substantially ... And even for workspaces that are structured the "proper" way, implementing this feature would help.

PDE does not have this problem, because it does not add the jars of the upstream projects as dependencies - only the Eclipse project.

In my words

The impact of an improvement would be quite large - basically Number of developers x Number of CTRL-S per day

I'm glad solutions are brainstormed again, and maybe now there is good momentum for improvement.