bndtools / bnd

Bnd/Bndtools. Tooling to build OSGi bundles including Eclipse, Maven, and Gradle plugins.
https://bndtools.org
Other
528 stars 304 forks source link

Occasional ConcurrentModificationException #5967

Closed juergen-albert closed 6 months ago

juergen-albert commented 7 months ago

One of our builds fails on github from time to time with an ConcurrentModificationException. One Example beeing: https://github.com/geckoprojects-org/org.gecko.emf.utils/actions/runs/7528737524/job/20492000254

I can't reproduce it locally and it only seems to happen during the github actions.

The Stacktrace:

Caused by: java.util.ConcurrentModificationException
    at aQute.bnd.osgi.Processor.addAll(Processor.java:229)
    at aQute.bnd.osgi.Processor.getInfo(Processor.java:214)
    at aQute.bnd.gradle.BndPlugin.checkProjectErrors(BndPlugin.java:986)
    at aQute.bnd.gradle.BndPlugin.checkErrors(BndPlugin.java:978)
    at aQute.bnd.gradle.BndPlugin$2.execute(BndPlugin.java:470)
    at aQute.bnd.gradle.BndPlugin$2.execute(BndPlugin.java:466)
    at org.gradle.api.internal.AbstractTask$TaskActionWrapper.execute(AbstractTask.java:802)
    at org.gradle.api.internal.AbstractTask$TaskActionWrapper.execute(AbstractTask.java:775)
    at org.gradle.api.internal.tasks.execution.TaskExecution$3.run(TaskExecution.java:242)
    at org.gradle.internal.operations.DefaultBuildOperationRunner$1.execute(DefaultBuildOperationRunner.java:29)
    at org.gradle.internal.operations.DefaultBuildOperationRunner$1.execute(DefaultBuildOperationRunner.java:26)
    at org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:66)
    at org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:59)
pkriens commented 7 months ago

I took a quick look. It seems that we check errors in each task while we also still execute that might generate errors/warnings.

I think the only reasonable solution is to actually start using the Workspace lock in the Gradle bnd plugin. We can execute each Task in a read lock and then collect the errors in a write lock. This seems doable, I only count about 8 tasks. Since we could still have concurrent errors (although I expect this is extremely unlikely because we create dedicated processors like Builder for each task) the read lock would officially not suffice. This is a common problem so maybe we should at least make the list objects concurrent.

@bjhargrave, got a minute to shine your light on this?

pkriens commented 7 months ago

@juergen-albert a temporary fix might be to have NO errors and warnings :-) I assume the projects have many warnings and this will widen the window it can happen?

bjhargrave commented 7 months ago

I think the only reasonable solution is to actually start using the Workspace lock in the Gradle bnd plugin.

That is gross overkill. The issue seems to be iterating over the errors/warnings list in getInfo:

https://github.com/bndtools/bnd/blob/b82dc867a7920edfcf32cb2cfa408795f4af6aea/biz.aQute.bndlib/src/aQute/bnd/osgi/Processor.java#L229

So the answer is to make a copy of the from list before iterating over it.

for (String message : List.copyOf(from)) { 

a temporary fix might be to have NO errors and warnings :-)

Perhaps some other part of the parallel build is also adding errors/warnings....

pkriens commented 7 months ago

So the answer is to make a copy of the from list before iterating over it.

I considered this but I am afraid this might cause us to sometimes miss errors/warnings?

A bit confused about your concern for the overhead. This is a shared lock for the working part. I looked at BndPlugin and the code seems easy to change to do all tasks in a read lock and read the checkErrors in a write lock? Are you concerned for performance?

bjhargrave commented 7 months ago

I considered this but I am afraid this might cause us to sometimes miss errors/warnings?

Well it is always a race especially if some other thread is adding more entries.

A bit confused about your concern for the overhead.

Its is unnecessary. It is also not a gradle plugin issue. It is an issue in Processor.getInfo so it should be the one changed to address the issue.

pkriens commented 7 months ago

Well, we're using Processor.getInfo() in a concurrent way? For which it was not designed. The danger is that if we go in this direction, we add more and more places to make bnd a 'little' bit concurrent. Since we're not really testing it, it feels a bit unsafe. Even the solution you proposed is is not completely safe because to copy the list you also need to iterate over the list, you just shorten the window where the error happens.

I really appreciate the feedback, but I am inclined to do the locking, this seems to solve all the problems and we've got the code in place.

If I make a PR, still willing to review it?

bjhargrave commented 7 months ago

Well, we're using Processor.getInfo() in a concurrent way? For which it was not designed.

None of Bnd was designs for concurrent use yet we have put considerable effort to fix that. This is just an area which was not previously identified as needing work.

but I am inclined to do the locking,

If you are going to do locking, it should be in getInfo and not forced into all the callers to getInfo (including the maven plugins).

If I make a PR, still willing to review it?

If the PR is to fix getInfo. I am not inclined to support adding locking into the callers of getInfo.

pkriens commented 6 months ago

Closed with #5967