Open cdietrich opened 4 years ago
@iloveeclipse did some more analysis
during the build it is possible to run other jobs on the workspace cause it does a getWorkManager().beginUnprotected();
now NotifyJob
comes in an "steals" that lock so the builder thread and the loader thread cannot aquire it to do the actual work of working with Storage2UriMapperJavaImpl
data. so if we are unlucky, one of these will aquire the synchronized (cachedPackageFragmentRootData)
so that NotifyJob
cannot aquire this synchronized lock and so both will block each other.
we will try to offload the work of org.eclipse.xtext.ui.resource.Storage2UriMapperJavaImpl.elementChanged(ElementChangedEvent)
to a separate job.
@szarnekow do you have some usecases in mind where late invalidation or updating of the data cache might be a problem
maybe we also need to do some similar magic like
org.eclipse.xtext.ui.resource.Storage2UriMapperJavaImpl.initializeCache(boolean)
for all modify operations
and make the read ops not needing a synchonized
if the NotifyJob happens async "somewhen" then i wonder if e.g. org.eclipse.xtext.builder.impl.javasupport.JdtToBeBuiltComputer.updateProject(ToBeBuilt, IProject, IProgressMonitor) might see outdated classpath and thus miss model files in jars on classpath
Processing change events on workspace notification thread is a problem, because it holds that workspace lock and
Storage2UriMapperJavaImpl
will try to lock via cachedPackageFragmentRootData
field. That is a road for deadlocks.
As discussed, NotificationManager$NotifyJob
can run in parallel to build because they use two different locks.
The trick is that Workspace.run() first locks the workspace resource via rule (in getWorkManager().checkIn(rule, monitor))
and after that unlocks the global workspace lock via getWorkManager().beginUnprotected();
NotifyJob however uses a "null" rule, so it owns the global workspace lock but does not lock any resource via rule.
So the JavaModelManager.initializeAllContainers also uses a null rule, it also calls workspace.run() but here it collides with the NotificationManager job that already owns global workspace lock.
Therefore my original root cause assessment was wrong, I will update the description about threads involved in the deadlock.
So the probably easiest solution for Xtext is to decouple Storage2UriMapperJavaImpl.elementChanged
/ Storage2UriMapperJavaImpl.updateCache
work from the resource notification thread, so we can't own workspace lock and try to acquire Storage2UriMapperJavaImpl lock at same time.
We've tried to replace Storage2UriMapperJavaImpl with our own implementation that simply runs a separated job on Storage2UriMapperJavaImpl.elementChanged, but that runs in a sporadic test issues because it tried then to update cache for already deleted projects.
So ideally update cache would either not require hard "synchronized" lock (use ILock for example, that could be interrupted) or not lock at all (may be call the problematic code that tries to resolve classpath at different time?).
Below is also the main problem source:
Thread[Worker-4: Updating workspace,5,main], state: BLOCKED
org.eclipse.xtext.ui.resource.Storage2UriMapperJavaImpl.getCachedData(Storage2UriMapperJavaImpl.java:256)
org.eclipse.xtext.ui.resource.Storage2UriMapperJavaImpl.updateCache(Storage2UriMapperJavaImpl.java:481)
org.eclipse.xtext.ui.resource.Storage2UriMapperJavaImpl.elementChanged(Storage2UriMapperJavaImpl.java:442)
org.eclipse.jdt.internal.core.DeltaProcessor$3.run(DeltaProcessor.java:1755)
org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:45)
org.eclipse.jdt.internal.core.DeltaProcessor.notifyListeners(DeltaProcessor.java:1743)
org.eclipse.jdt.internal.core.DeltaProcessor.firePostChangeDelta(DeltaProcessor.java:1576)
org.eclipse.jdt.internal.core.DeltaProcessor.fire(DeltaProcessor.java:1552)
org.eclipse.jdt.internal.core.DeltaProcessor.notifyAndFire(DeltaProcessor.java:2273)
org.eclipse.jdt.internal.core.DeltaProcessor.resourceChanged(DeltaProcessor.java:2163)
org.eclipse.jdt.internal.core.DeltaProcessingState.resourceChanged(DeltaProcessingState.java:477)
org.eclipse.core.internal.events.NotificationManager$1.run(NotificationManager.java:305)
org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:45)
org.eclipse.core.internal.events.NotificationManager.notify(NotificationManager.java:295)
org.eclipse.core.internal.events.NotificationManager.broadcastChanges(NotificationManager.java:158)
org.eclipse.core.internal.resources.Workspace.broadcastPostChange(Workspace.java:380)
org.eclipse.core.internal.resources.Workspace.endOperation(Workspace.java:1502)
org.eclipse.core.internal.resources.Workspace.run(Workspace.java:2306)
org.eclipse.core.internal.events.NotificationManager$NotifyJob.run(NotificationManager.java:44)
org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
SIde note: JDT changed the behaviour of getHandleIdentifier last year to do a resolveClasspath which might have terribly accelerated the problem (asked at jdt-dev, lets see that they say)
we have a potential deadlock with Parallel Resource loading
waits for workspace lock and blocks other accesses to PackageFragmentRootData
waits
this one may invalidate the classpath
this one owns the workspace lock, but is blocked by the load operation waiting for the workspace lock