Closed kriegfrj closed 3 years ago
This is still happening. https://github.com/bndtools/bnd/pull/4278/checks?check_run_id=1020131734#step:5:587
Yes, I've been keeping an eye on it while working on #4279. Recording some observations to aid further investigation:
So I think the issue lies somewhere in the code that initialises the Eclipse projects. The WorkbenchExtension creates a temp directory to use as an empty workspace at startup, and then the WorkspaceImporter is used to import the template workspace from the resources folder into the temporary workspace to work on. It seems that sometimes, for some reason, Bndtools is not picking up the changes to the workspace. Sometimes this results in bndProject
coming back null
, and sometimes it results in the LocalIndexedRepo coming back null
.
One theory is that despite the synchronisation efforts, the attempt to fetch the Bnd config is a bit too early and the workspace isn't ready. Bndtools caches the config and then doesn't notice when the workspace is updated.
Any other suggested avenues of investigation welcomed.
Has this disappeared? Or have you simply given up reporting when it happens?
Has this disappeared?
No.
Or have you simply given up reporting when it happens?
I forget. You can look in https://github.com/bndtools/bnd/actions?query=workflow%3A%22CI+Build%22 and most of the failed builds are because of this issue. But sometimes, I rerun the jobs since I need a clean build to publish the artifacts.
Wishful thinking then... :) I've been thinking about this one and i think i might know where to start looking...
I updated the testOSGi
task to ignore failures as this test case is failing too much and driving me mad!
I disabled the testOSGi
task in the CI build.
The latest failures are interesting:
Execution failed for task ':bndtools.core.test:testOSGi'.
java.io.EOFException: No input provided for project
This looks like a meta-failure rather than a failure of the test cases themselves. It would be nice if we had a stack trace of that exception.
That exception is from inside Gradle. Run the command with -s
and it will show the stacktrace. It is caused because the test vm fails to send any output. I think that perhaps my adding ignoreFailures = true
has allowed gradle to continue to the point where it then triggers the eof exception.
java.io.EOFException: No input provided for project
It would be nice if we had a stack trace of that exception.
We often run into these exceptions when running integrations tests with parallel Maven builds on Jenkins, see this job.
I just ran a build with debug logging -X
and this is the stacktrace showing:
[ERROR] Failed to execute goal biz.aQute.bnd:bnd-testing-maven-plugin:5.2.0:testing (default) on project org.openhab.core.model.item.tests: No input provided for project -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal biz.aQute.bnd:bnd-testing-maven-plugin:5.2.0:testing (default) on project org.openhab.core.model.item.tests: No input provided for project
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:190)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:186)
at java.util.concurrent.FutureTask.run (FutureTask.java:264)
at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:515)
at java.util.concurrent.FutureTask.run (FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:628)
at java.lang.Thread.run (Thread.java:834)
Caused by: org.apache.maven.plugin.MojoExecutionException: No input provided for project
at aQute.bnd.maven.testing.plugin.TestingMojo.execute (TestingMojo.java:160)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:190)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:186)
at java.util.concurrent.FutureTask.run (FutureTask.java:264)
at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:515)
at java.util.concurrent.FutureTask.run (FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:628)
at java.lang.Thread.run (Thread.java:834)
Caused by: java.io.EOFException: No input provided for project
at org.apache.tools.ant.Project.defaultInput (Project.java:1293)
at org.apache.tools.ant.Project.demuxInput (Project.java:1315)
at org.apache.tools.ant.DemuxInputStream.read (DemuxInputStream.java:73)
at aQute.lib.io.IO.copy (IO.java:183)
at aQute.libg.command.Command.execute (Command.java:167)
at aQute.bnd.build.ProjectLauncher.launch (ProjectLauncher.java:374)
at aQute.launcher.plugin.ProjectLauncherImpl.launch (ProjectLauncherImpl.java:197)
at aQute.tester.junit.platform.plugin.ProjectTesterImpl.test (ProjectTesterImpl.java:99)
at aQute.bnd.build.Project.test (Project.java:2520)
at aQute.bnd.maven.testing.plugin.TestingMojo.lambda$getOperation$1 (TestingMojo.java:207)
at aQute.bnd.maven.lib.resolve.BndrunContainer.execute (BndrunContainer.java:179)
at aQute.bnd.maven.testing.plugin.TestingMojo.execute (TestingMojo.java:157)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:190)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:186)
at java.util.concurrent.FutureTask.run (FutureTask.java:264)
at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:515)
at java.util.concurrent.FutureTask.run (FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:628)
at java.lang.Thread.run (Thread.java:834)
This is caused by some kind-of-broken code in ant:
InputStream.read
should return -1
at end of stream, not EOFException
. And Bnd is not expecting EOFExcepion
for the purpose of reaching end of stream.
I did some more debugging of this annoying exception and it seems that in our case System.in
is often (not always) assigned a DemuxInputStream
after running the spotbugs-maven-plugin:spotbugs
goal in our Maven reactor. Normally System.in
is a java.io.BufferedInputStream
. I'll see if it is still an issue with the latest version of that plugin or if I can workaround it.
The aQute.lib.io.IO.copy
also sometimes has issues with java.io.BufferedInputStream
in multi threaded Maven builds:
[ERROR] Failed to execute goal biz.aQute.bnd:bnd-testing-maven-plugin:5.2.0:testing (default) on project org.openhab.core.model.item.tests: Stream closed -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal biz.aQute.bnd:bnd-testing-maven-plugin:5.2.0:testing (default) on project org.openhab.core.model.item.tests: Stream closed
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:190)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:186)
at java.util.concurrent.FutureTask.run (FutureTask.java:264)
at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:515)
at java.util.concurrent.FutureTask.run (FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:628)
at java.lang.Thread.run (Thread.java:834)
Caused by: org.apache.maven.plugin.MojoExecutionException: Stream closed
at aQute.bnd.maven.testing.plugin.TestingMojo.execute (TestingMojo.java:160)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:190)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:186)
at java.util.concurrent.FutureTask.run (FutureTask.java:264)
at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:515)
at java.util.concurrent.FutureTask.run (FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:628)
at java.lang.Thread.run (Thread.java:834)
Caused by: java.io.IOException: Stream closed
at java.io.BufferedInputStream.getBufIfOpen (BufferedInputStream.java:176)
at java.io.BufferedInputStream.read (BufferedInputStream.java:342)
at aQute.lib.io.IO.copy (IO.java:183)
at aQute.libg.command.Command.execute (Command.java:167)
at aQute.bnd.build.ProjectLauncher.launch (ProjectLauncher.java:374)
at aQute.launcher.plugin.ProjectLauncherImpl.launch (ProjectLauncherImpl.java:197)
at aQute.tester.junit.platform.plugin.ProjectTesterImpl.test (ProjectTesterImpl.java:99)
at aQute.bnd.build.Project.test (Project.java:2520)
at aQute.bnd.maven.testing.plugin.TestingMojo.lambda$getOperation$1 (TestingMojo.java:207)
at aQute.bnd.maven.lib.resolve.BndrunContainer.execute (BndrunContainer.java:179)
at aQute.bnd.maven.testing.plugin.TestingMojo.execute (TestingMojo.java:157)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:190)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call (MultiThreadedBuilder.java:186)
at java.util.concurrent.FutureTask.run (FutureTask.java:264)
at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:515)
at java.util.concurrent.FutureTask.run (FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:628)
at java.lang.Thread.run (Thread.java:834)
It might be an idea to catch these exceptions in IO.copy
when they occur?
I've injected my own stream in System.in
to see what code is sometimes closing it and it seems to be bnd itself:
java.lang.RuntimeException: !!! CLOSED !!!
at org.openhab.tools.analysis.report.MyBufferedInputStream.close(MyBufferedInputStream.java:16)
at aQute.lib.io.IO.close(IO.java:951)
at aQute.lib.io.IO.close(IO.java:960)
at aQute.libg.command.Command.execute(Command.java:188)
at aQute.bnd.build.ProjectLauncher.launch(ProjectLauncher.java:374)
at aQute.launcher.plugin.ProjectLauncherImpl.launch(ProjectLauncherImpl.java:197)
at aQute.tester.junit.platform.plugin.ProjectTesterImpl.test(ProjectTesterImpl.java:99)
at aQute.bnd.build.Project.test(Project.java:2520)
at aQute.bnd.maven.testing.plugin.TestingMojo.lambda$getOperation$1(TestingMojo.java:207)
at aQute.bnd.maven.lib.resolve.BndrunContainer.execute(BndrunContainer.java:179)
at aQute.bnd.maven.testing.plugin.TestingMojo.execute(TestingMojo.java:157)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:190)
at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:186)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
This code might not be thread safe when other builder threads change System.in
:
This code might not be thread safe when other builder threads change
System.in
Lets move this discussion to #4509
It is possible the fix in https://github.com/bndtools/bnd/pull/4503 may have solved the flakiness issue. I have not seen the test fail on my system in a while. Lets still leave this open for now.
I had noticed that this had improved, but i doubt it was #4503 - the old flakiness was happening at startup before any tests start running, while the fix in #4503 doesn't kick in until after the tests start running. But i did make some other changes that may have helped this issue to settle. I'd been meaning to ask has it been fixed for a while?
Have you seen any failures recently, @bjhargrave ?
Have you seen any failures recently, @bjhargrave ?
It has been stable on my local system for a while. I think that after we finish 5.3.0, we can re-enable this test in the CI build.
It has been stable on my local system for a while. I think that after we finish 5.3.0, we can re-enable this test in the CI build.
I'm pretty sure it was #4404 that stabilized this issue. There was a race condition with the importing of the workspace that happens during @BeforeAll
- if the builder started to build before the cnf
directory was successfully imported the build would have NPE'd and not finished. But usually the import of cnf
would finish before the build got that far and so it would proceed as normal.
I enabled the test in the CI build and it failed. https://github.com/bndtools/bnd/actions/runs/549787921
Ahh, failed with a different issue - it seems to be a variant of #4503 has re-appeared.
While working on #4553 I noticed this old failure re-appear again. 😞 At least #4404 seems to have significantly reduced its frequency, so perhaps the root cause is related or along those lines.
Ok, thanks to the intensive debugging I did to uncover #4573, I think I have also uncovered the source of this intermittent test startup failure.
Ideally, in order to have each tests start from a clean slate, you would set up a new Eclipse workspace for each test class run. Unfortunately, Eclipse cannot dynamically change its workspace directory at run time, so the only way to do this would be to restart Eclipse for each test class. This would slow down the test significantly.
So instead, I elected to do the next closest thing that I could: the same workspace directory is re-used for all tests during the run, in generated/workspace
, but at the start of each test class it is wiped clean by the Jupiter extension WorkbenchExtension
:
https://github.com/bndtools/bnd/blob/b53c3b8568d7bc6800a74a05685c523c303d20c1/bndtools.core.test/src/bndtools/core/test/utils/WorkbenchExtension.java#L101-L108
The extension then imports the projects from the template bnd workspace (in the resources
directory) into the active workspace in generated/workspace
, before handing off to the main test class to do its thing. The template workspace includes the cnf
directory.
So the important thing about this for the current issue is: when the CI first starts the test, the generated/workspace
directory will not have a cnf
directory.
CnfWatcher
CnfWatcher
is the class responsible for listening to changes to the workspace and seeing if the cnf
project has changed. If it has changed, then it will re-initialise the bnd Workspace
object that is cached by Central
.CnfWatcher
is installed by BndtoolsBuilder
's static initializer. Hence, it is not installed until BndtoolsBuilder
has been loaded.BndtoolsBuilder
is not loaded until the first time it is called upon to be built.There are two possible startup orders that can yield the observed results:
cnf
project: in this scenario, Central.getWorkspace()
is called for the first time when the cnf
project has already been imported and has not yet been deleted. The tester thread is normally starts early and so this is what normally happens. In this case, the memoized Workspace
object will correspond to configuration in the cnf
project that existed at that point in time.cnf
project: in this scenario, Central.getWorkspace()
is called for the first time when there is no cnf
project in the workspace. In this scenario:
Workspace
is first memoized, it reflects a missing cnf
project (eg, no repositories).WorkbenchExtension
finishes importing the cnf
project.BndtoolsBuilder
has not yet been loaded, and so CnfWatcher
has not been installed.cnf
project is subsequently imported, there is no watcher installed yet to react to the change.Central
to re-initialize the Workspace
object once the cnf
project is finished importing.Workspace
object continues to reflect a missing cnf
project.This also explains why I didn't see it often when testing locally (either from within Eclipse or from Gradle) - I didn't make a habit of cleaning my generated/workspace
directory when running locally, so at startup there was usually a cnf
project already in the workspace left over from the last run. The window where you may have been missing a cnf
project (between deleting the pre-existing cnf
project and importing the template again) was substantially shorter than if you started from an empty workspace. In CI, you're invariably starting from an empty workspace, so that window where there is no cnf
project is substantially wider and more likely to trigger this race condition.
Having identified the problem, I think I can solve it simply by ensuring that BndtoolsBuilder
is loaded before the cnf
project is imported. Or by another (possibly more elegant) means.
Having identified the problem, I think I can solve it simply by ensuring that
BndtoolsBuilder
is loaded before thecnf
project is imported. Or by another (possibly more elegant) means.
Ok, my "more elegant means" is to manually call clear()
, forceRefresh()
and getPlugins()
on the bnd Workspace
object after the import is complete. And by batching this along with all of the import operations into a WorkspaceJob
with a scheduling rule that locks the entire Eclipse workspace, I can fix #4573 at the same time as this import/refresh job will not be able to run concurrently with FlushJob
, avoiding the possibility of deadlock.
I have this fix ready to submit as a PR, pending the merge of #4569.
Nice work. I started to work on rewriting the builder and refresh in Eclipse since it is a mess written by people that had no clue (which prominently includes me). It will be nice to create test cases for this. For the first time :-)
I have this fix ready to submit as a PR, pending the merge of #4569.
Merged.
Having identified the problem, I think I can solve it simply by ensuring that
BndtoolsBuilder
is loaded before thecnf
project is imported. Or by another (possibly more elegant) means.Ok, my "more elegant means" is to manually call
clear()
,forceRefresh()
andgetPlugins()
on the bndWorkspace
object after the import is complete. And by batching this along with all of the import operations into aWorkspaceJob
with a scheduling rule that locks the entire Eclipse workspace, I can fix #4573 at the same time as this import/refresh job will not be able to run concurrently withFlushJob
, avoiding the possibility of deadlock.I have this fix ready to submit as a PR, pending the merge of #4569.
Just for the record - the fix above hit similar deadlock issues. The fix that I settled on (currently awaiting merge in #4579) was slightly different:
By waiting for the workspace object to be ready before handing off to the test code, we're guaranteed that the LocalIndexedRepo
will also be ready when we try and fetch it.
Seems to be stable now, so perhaps this issue can finally be put to bed and we can start working on other integration tests...
Ok, as @bjhargrave reported here: https://github.com/bndtools/bnd/pull/4579#issuecomment-824351582, this issue is still not quite resolved.
I've had a look at the code again, and I think the issue might be here: https://github.com/bndtools/bnd/blob/6fc2e52edccfe28fde1c8a7f4d36caac1dd28359/bndtools.core.test/src/bndtools/core/test/utils/WorkspaceImporter.java#L94-L97
The WorkspaceImporter
first deletes all projects in the workspace, then imports all the projects from the new workspace in preparation for the next test. This could actually result in onCnfWorkspace()
being called twice - once for the delete of the old cnf
project, and once for the import of the new cnf
project. The intent is for this code to block until the import has finished and the Workspace object is fully initialised, but it may only end up blocking until the delete has finished. This leaves a race condition if the import doesn't manage to finish before the attempt to access the Local Index. Need to think of a way around this.
This appears to be a consequent of reusing the running Eclipse instance/workspace across tests cases. If each test case started with its own non-shared eclipse instance and workspace, then no such issues could occur.
There appears to be a race condition in the startup of the integration test that causes the
beforeAll()
method of the integration test to fail to get a local repo:https://github.com/bndtools/bnd/blob/b4cddedfe8f233e1005828d00f3c83a1345611bc/bndtools.core.test/src/bndtools/core/test/editors/quickfix/BuildpathQuickFixProcessorTest.java#L77-L91
This needs to be fixed.