Synchronize metamodel evolution across threads.

eclipse-qvt-oml-bot commented 1 week ago

| --- | --- | | Bugzilla Link | 583533 | | Status | NEW | | Importance | P3 normal | | Reported | Aug 28, 2024 08:52 EDT | | Modified | Nov 04, 2024 07:46 EDT | | Blocks | 525852 | | See also | 582625 | | Reporter | Ed Willink |

Description

Bug 525852 reported intermittent testing on Jenkins. The problems were traced to the lazy initialization of oclIterators by concurrent QvtCompletionCompiler / QvtReconciler.

OCL Bug 582625 addresses the laziness of oclIterators() etc ensuring that EcoreOCLStandardLibrary provides an eagerly initialized library.

BUT, resolving the transitive TupleType identifies that orphan types such as Set and Tuple are dynamically created, and for QVTo, this may be on concurrent threads.

QVTo must find some way to synchronize e.g. TypeResolverImpl.findTupleType.

A potential solution might involve the derived QvtOperationalEnvFactory creating a derived TypeResolverImpl that synchronizes sensitive methods.

TypeResolverImpl methods seem to be protected but the overloads to create a derived TypeResolverImpl seem to be blocked by the use of an EcoreEnvironment.INSTANCE.

May need a minor OCL tweak to support the required extensibility.

eclipse-qvt-oml-bot commented 1 week ago

By Ed Willink on Aug 29, 2024 07:29

(In reply to Ed Willink from comment #0)

May need a minor OCL tweak to support the required extensibility.

No. An org.eclipse.ocl.ecore.internal.TypeResolverImpl is created to support evolution of the EcoreOCLStandardLibrary. A QvtTypeResolverImpl / BasicTypeResolverImpl is created for use by QVTo.

QVTo must find some way to synchronize e.g. TypeResolverImpl.findTupleType.

THIS APPROACH IS DOOMED TO FAILURE.

EMF is explicitly documented to not support concurrency.

OCL is implicitly documented to not support concurrency.

QVTo is not designed to be concurrent, let alone manage OCL/EMF/Platform limitations.

If we try to synchronize each state aspect we will forever be chasing yet another part of the fragmented state implementation.

Xtext has the same problem and a solution. An IUnitOfWork is queued for readonly/modify exclusive service.

I suggest that QVTo should define the reconciler and the parser as sub-applications that are queued to an overall application that ensures non-concurrency.

eclipse-qvt-oml-bot commented 1 week ago

By Christopher Gerking on Aug 30, 2024 03:54

(In reply to Ed Willink from bug 582625 comment #6)

The OCL fix just highlights how many other problems QVTo has to resolve to support concurrency.

Our concrete problem at hand is caused by the invocation of IDE.openEditor(...) inside the CompletionTest suite. Opening the editor will immediately start the QvtReconciler and its BackgroundThread. Synchronizing this thread is possible by:

Making the reconciler accessible to the CompletionTest. An API for the access is already available, so this change only requires reuse of the QvtReconciler after it has been constructed (rather than constructing a new QvtReconciler on each access).
Using the QvtReconciler to synchronize the main thread (executing the JUnit completion tests) with the reconciler's background thread. This requires adding synchronization statements to the process(...) / initialProcess(...) methods of the QvtReconciler and to the compututation of completion proposals in CompletionTest.

The advantage of this fix is that it is quite local and most of the change affects "only" the QVTo UI and completion test itself. Propotyped on cgerking/583533: https://git.eclipse.org/c/mmt/org.eclipse.qvto.git/commit/?h=cgerking/583533

Is there an alternative way to suppress/disable the reconciler when running the completion tests?

eclipse-qvt-oml-bot commented 1 week ago

By Ed Willink on Aug 30, 2024 04:17

The complexity/flexibility of the Xtext solution is necessary because 'editing' has many concurrent activities: syntax highlighting, parsing, overview, problems, refactoring, ...

If QVTo concurrency is limited to a single asynchronous IDE.openEditor(...) call for CompletionTest, then just arrange for it to be 'synchronous'; i.e. wait for the Reconciler job to complete.

Yes, avoiding multiple reconcilers seems good.

eclipse-qvt-oml-bot commented 1 week ago

By Christopher Gerking on Aug 30, 2024 05:07

(In reply to Ed Willink from comment #3)

wait for the Reconciler job to complete.

Which doesn't seem too trivial to me as AbstractReconciler's fThread is private. Even if we can wait for it to complete, can we be sure that it won't restart?

I wonder if the reonciler's uninstall() API is somehow helpful when invoked before actually executing the completion tests. But I don't think so because it only announces that cancellation is desired. The reconciler (and its underlying QVTOCompiler) may or may not react to this before the completion tests are kicked off.

eclipse-qvt-oml-bot commented 1 week ago

By Ed Willink on Aug 30, 2024 06:39

You are using a derived reconciler, and for test purposes you can propbably arrange to use a derived-derived reconciler that communicates its status via a static variable/notoification.

Alternatively if the Job name is predicatable you can obtain the Job from the JobManager and poll its status for completion.

eclipse-qvt-oml-bot commented 1 week ago

By Christopher Gerking on Aug 30, 2024 08:50

(In reply to Ed Willink from comment #5)

You are using a derived reconciler, and for test purposes you can propbably arrange to use a derived-derived reconciler that communicates its status via a static variable/notoification.

But isn't it essentially what the approach on cgerking/583533 already does? As long as the reconciler is running, the synchronization forces the completion tests to wait. Would the variable/notification enable us to go without synchronization?

Alternatively if the Job name is predicatable you can obtain the Job from the JobManager and poll its status for completion.

The thing is that it's not even a Job. The BackgroundThreat is created immediately when installing the AbstractReconciler.

eclipse-qvt-oml-bot commented 1 week ago

By Ed Willink on Aug 30, 2024 09:35

(In reply to Christopher Gerking from comment #6)

(In reply to Ed Willink from comment #5)

You are using a derived reconciler, and for test purposes you can propbably arrange to use a derived-derived reconciler that communicates its status via a static variable/notoification. But isn't it essentially what the approach on cgerking/583533 already does? As long as the reconciler is running, the synchronization forces the completion tests to wait. Would the variable/notification enable us to go without synchronization?

Not really. THe synchronized is not quite where you want it. I think there is a risk that the synchronized startREconcilingmight occur too late and deadlock.

The BackgroundThread is private making kit difficult to synchronize run().

Alternatively if the Job name is predicatable you can obtain the Job from the JobManager and poll its status for completion. The thing is that it's not even a Job. The BackgroundThreat is created immediately when installing the AbstractReconciler.

Yes. Job is a red herring, sorry, misremembered. And the BackgroundThread is an invisible sub-Thread.

I'm not sure that the two-stage synchronize of initialProcess is what you want.

Surely we just need:

IDE.openEditor\ wait for open corrolaries to complete

the corrolaries are close enough to complete when process() completes, so a this.notify() at the end of an overloaded process() can wake up a reconciler.wait() in the caller.

(If the reconciler doesn't run (surely impossible) you could invoke forceReconciling().)

(If the reconciler runs again (surely also impossible unless you animate the editor with edit commands) an overloaded initialProcess() could throw an IllegalStateException.

eclipse-qvt-oml-bot commented 1 week ago

By Christopher Gerking on Aug 30, 2024 10:05

(In reply to Ed Willink from comment #7)

THe synchronized is not quite where you want it.

Only to avoid misunderstandings: both initialProcess() and process() are invoked in the run() method of the BackgroundThread. Synchronizing these methods should therefore effectively prevent the reconciler from being intertwined with the completion proposal tests (also synchronized) on the main thread.

eclipse-qvt-oml-bot commented 1 week ago

By Ed Willink on Aug 30, 2024 10:16

(In reply to Christopher Gerking from comment #8)

(In reply to Ed Willink from comment #7)

THe synchronized is not quite where you want it. Only to avoid misunderstandings: both initialProcess() and process() are invoked in the run() method of the BackgroundThread. Synchronizing these methods should therefore effectively prevent the reconciler from being intertwined with the completion proposal tests (also synchronized) on the main thread.

Yes. But there is also a synchronized in the main thread on startReconciling, so you have three synchronized phases when you only wanted one decision point.

eclipse-qvt-oml-bot commented 1 week ago

By Christopher Gerking on Aug 31, 2024 08:08

(In reply to Ed Willink from comment #7)

I think there is a risk that the synchronized startREconcilingmight occur too late and deadlock.

With my limited concurrency knowledge, I don't understand this risk. The synchronized startReconciling is invoked on the main thread and starts the background threat. If we add synchronization to the background threat, the reconciler must of course wait until startReconciling has finished before it can actually start reconciling. But if startReconciling will start the background threat, how can it occur too late?

Surely we just need:

IDE.openEditor wait for open corrolaries to complete

the corrolaries are close enough to complete when process() completes, so a this.notify() at the end of an overloaded process() can wake up a reconciler.wait() in the caller.

Done on cgerking/583533-notify. There is still a synchronized(this) in process(...), so I don't yet understand what problem this change is supposed to address. The observation is that it causes a huge runtime overhead (350 sec instead of 145 sec with cgerking/583533). Possibly this is due to my own failure in the implementation.

(In reply to Christopher Gerking from comment #8)

Only to avoid misunderstandings: both initialProcess() and process() are invoked in the run() method of the BackgroundThread. Synchronizing these methods should therefore effectively prevent the reconciler from being intertwined with the completion proposal tests (also synchronized) on the main thread.

It would also be possible to move the two synchronized(reconciler) statements from QvtReconciler directly into QvtReconcilingStrategy.getCompilationResult(...), reducing them to one synchronization. Whereas it avoids method overridings in QvtReconciler, the impact on maintainability is questionable. First, the QvtReconcilingStrategy must now take care of proper synchronization whenever it evolves. Second, retrieving the QvtReconciler (still needed for sync) inside the QvtReconcilingStrategy is possible, but might be considered an indirection.

eclipse-qvt-oml-bot commented 1 week ago

By Ed Willink on Sep 01, 2024 03:07

On the one hand this is just test code so we shouldn't get too hung up on perfection.

On the other hand there is a longstanding flakiness and we don't want to replace it with a different flakiness.

Long experience has taught me that ad hoc synchronization generally provides dead/live-lock opportunities.

Your earlier solution introduced new synchronizes to the non-test code. This was undesirable.

Your new solution avoids that but still relies on synchronize of the QVTreconciler. Unfortunately the inherited startReconciling has a synchronize for reasons that I do not understand. There is presumably some synchronization algorithm in use by the basic editor so there is a risk that another synchronization algorithm might interact.

Using a distint synchronization monitor as in ewillink/583533-notify hopefully makes zero changes to standard non-test semantics while supporting test synchronization.

eclipse-qvt-oml-bot commented 1 week ago

By Christopher Gerking on Sep 02, 2024 07:02

(In reply to Ed Willink from comment #11)

On the one hand this is just test code so we shouldn't get too hung up on perfection.

Agreed. But what I don't like so much is non-test code (QvtReconciler) to be particularly instrumented for test issues (no problem with the other way round). That's why I first preferred synchronize as a more generic approach, which at least is not geared to the specific needs of individual test code. But...

There is presumably some synchronization algorithm in use by the basic editor so there is a risk that another synchronization algorithm might interact.

... of course you are right that synchronize might in fact interact in some unforeseen way.

Your new solution avoids that but still relies on synchronize of the QVTreconciler. Unfortunately the inherited startReconciling has a synchronize for reasons that I do not understand.

This was already noticed in bug 40549 >20 years ago, no reasons mentioned anyhow.

Using a distint synchronization monitor as in ewillink/583533-notify hopefully makes zero changes to standard non-test semantics while supporting test synchronization.

I see. The thing is that we make huge efforts to wait for the reconciler to finish without actually relying on it to execute at all. We would just need to exclude a parallel execution, which is pretty much ensured by a plain synchronization (and apparently much faster as well).

Your earlier solution introduced new synchronizes to the non-test code. This was undesirable.

Would it be a compromise to actually use the QvtReconcilingStrategy for synchronization, as it doesn't have any prior synchronized methods that might interact? See cgerking/583533 (and compare the runtime overhead of the notify approach when running CompletionTests).

Alternatively, indeed use a new Object() as myMonitor inside the QvtReconcilingStrategy, and enable CompletionTest to access it through a getMonitor() method for external synchronization? This would avoid any interaction with existing sync algorithms.

eclipse-qvt-oml-bot commented 1 week ago

By Ed Willink on Sep 02, 2024 07:44

(In reply to Christopher Gerking from comment #12)

I see. The thing is that we make huge efforts to wait for the reconciler to finish without actually relying on it to execute at all. We would just need to exclude a parallel execution, which is pretty much ensured by a plain synchronization (and apparently much faster as well).

I don't see huge efforts (time-wise); the sleep loop is probably unnecessary but I don't know how quickly the reconciler runs so there is a hazard of waiting too soon.

Parallel execution is indeed by excluded by the monitor.

(and compare the runtime overhead of the notify approach when running CompletionTests).

I'm not sure which approach you are identifying as bad. I'm not clear that there is any significant overhead wrt the Monitor. The only signifocant speed-up I can see is allowing the hazardous concurrency so that the concurrent threads make better use of the CPU without any guarantee of integrity.

Alternatively, indeed use a new Object() as myMonitor inside the QvtReconcilingStrategy, and enable CompletionTest to access it through a getMonitor() method for external synchronization? This would avoid any interaction with existing sync algorithms.

This seems to involve a synchronize on compile that just seems to add a new risk without solving the old.

The problem is that IDE.openEditor's start-up corrolaries are concurrent; we must wait for them. The monitor does precisely that while imposing only a gratuitous Object construction on regular execution. waitTillProcessingDone is only called by test code - may be it should have 'test' as part of its name. You could migrate it to a different class at the expense of breaking modularity.

eclipse-qvt-oml-bot commented 1 week ago

By Christopher Gerking on Sep 02, 2024 08:01

(In reply to Ed Willink from comment #13)

(and compare the runtime overhead of the notify approach when running CompletionTests).

I'm not sure which approach you are identifying as bad.

Not bad, but waitTillProcessingDone() seems to be very geared to the specific needs of the completion tests. Not a fundamental problem of course.

I'm not clear that there is any significant overhead wrt the Monitor. The only signifocant speed-up I can see is allowing the hazardous concurrency so that the concurrent threads make better use of the CPU without any guarantee of integrity.

On my machine it's 145 seconds (with cgerking/583533) vs. 350 seconds (with both ewillink/583533-notify and cgerking/583533-notify). Could of course be due to hazardous concurrency.

eclipse-qvt-oml-bot commented 1 week ago

By Christopher Gerking on Sep 02, 2024 11:24

(In reply to Ed Willink from comment #13)

This seems to involve a synchronize on compile that just seems to add a new risk without solving the old.

The compile was already inside the synchronize before, so let me get this right: Of course my proposed monitor provides a new API that enables arbitrary clients to synchronize with the reconciler (in theory, block it forever with malicious intent). But in fact this API is used by the completion tests only, which require exactly this synchronization and do of course not block. And the whole org.eclipse.m2m.internal.qvt.oml.editor.ui.completion package is provisional API, restricting its visibility to QVTo's UI tests anyhow.

So as long as we assume that nobody (except the completion tests) uses the additional synchronization monitor, the compilation will not be affected. I added this version to cgerking/583533 for comparison.

In general, of course I agree that we should focus this solution on functionality (and prioritize security over beauty). I would just like to make sure that it's really necessary for the completion tests to take twice as much time as before.

eclipse-qvt-oml-bot commented 1 week ago

By Ed Willink on Sep 02, 2024 16:04

I'm surprised that correct sychronisation doubles run time, but a necessary price if it avoids flakiness.

If the unsynchronised behaviour has significant concurrency that could account for some (unsafe) speed up.

If the unsafe concurrent executions share the same patterns of memory access, it is possible that the earlier execution loads the cache to help the later execution. By the time a safe synchronized second execution happens the cache may have gone stale.

eclipse-qvt-oml-bot commented 1 week ago

By Christopher Gerking on Sep 04, 2024 08:46

(In reply to Ed Willink from comment #13)

The problem is that IDE.openEditor's start-up corrolaries are concurrent; we must wait for them.

We must wait only for those corrolaries that are already running. For those that don't run, we can squeeze in and execute the test first (remaining corrolaries following subsequently).

(In reply to Ed Willink from comment #16)

I'm surprised that correct sychronisation doubles run time, but a necessary price if it avoids flakiness.

This is probably because in many cases, there are actually no corollaries running yet.

Plus: Therefore the test can often finish without any waiting time.

Minus: As discussed in comment 15, this could represent a security risk as the synchronization can delay (or even block) the reonciler. On cgerking/583533, I tried to limit this risk using the Java concurrency API. By encapsulating the sync monitor in the reonciling strategy, we can prevent access to it from the outside (we might also use the reconciler rather than the strategy). Instead, a Runnable can be passed in from the outside and will run internally synchronized. The completion tests can use this Runnable to pass in a RunnableFuture and then wait for its completion (synchronized with any start-up corollaries). Of course, malicious clients with access to the reconciler could still pass in a Runnable with run() {while(true) {/do nothing just block/}}.

(In reply to Ed Willink from comment #13)

I don't see huge efforts (time-wise); the sleep loop is probably unnecessary but I don't know how quickly the reconciler runs so there is a hazard of waiting too soon.

Isn't this what Thread.onSpinWait() is for? I don't know which kind of magic it includes to do busy waiting in a smarter way.

eclipse-qvt-oml-bot commented 1 week ago

By Ed Willink on Sep 05, 2024 03:44

(In reply to Christopher Gerking from comment #17)

Isn't this what Thread.onSpinWait() is for? I don't know which kind of magic it includes to do busy waiting in a smarter way.

Thread.onSpinWait() is not available in Java 8. It appears to help with waiting for a notify, but that sleep loop waits for a reconciler start that may (surely will) come at an inderminate time after IDE.openEditor. Maybe a distinguishable notify from the start of initialProcess() could be monitored by an onSpinWait().

(In reply to Ed Willink from comment #13)

The problem is that IDE.openEditor's start-up corrolaries are concurrent; we must wait for them. We must wait only for those corrolaries that are already running. For those that don't run, we can squeeze in and execute the test first (remaining corrolaries following subsequently).

If you can 'prove' that some concurrencies are ok, then you can speed up the tests, but I doubt you have an adequate description of what the reconciler does, let alone a mathematical formulation.

In the absence of 'proof' you risk exchanging one identified flakiness for which we have a safe slow, but not catastrophically slow, cure for another possible flakiness.

eclipse-qvt-oml-bot commented 1 week ago

By Ed Willink on Oct 28, 2024 08:40

This seems to have stalled. M2 is past, M3 is on the horizon. It would be nice to have reliable albeit slow testing.

eclipse-qvt-oml-bot commented 1 week ago

By Christopher Gerking on Oct 29, 2024 06:06

(In reply to Ed Willink from comment #19)

It would be nice to have reliable albeit slow testing.

But why not reliable and fast? :) I still think that we should not wait unnessecarily long, see below.

(In reply to Christopher Gerking from comment #17)

We must wait only for those corrolaries that are already running. For those that don't run, we can squeeze in and execute the test first (remaining corrolaries following subsequently).

And this is what the fix proposed on cgerking/583533-notify does. Either way, you are fully right that we should resume and finish our work on this bug.

eclipse-qvt-oml-bot commented 1 week ago

By Ed Willink on Nov 04, 2024 07:46

(In reply to Christopher Gerking from comment #20)

(In reply to Ed Willink from comment #19)

It would be nice to have reliable albeit slow testing. But why not reliable and fast? :) I still think that we should not wait unnessecarily long, see below.

Your approach is probably 'fast', but I see no grounds to believe it is reliable since I/we do not understand when concurrency is triggered.

My approach avoids the concurrency so should be reliable but 'slow'.

Since the performance change is not dramatic and only during testing, I think slow and reliable is the better solution to our intermittent test failures.

(Either way, since this is a test-only change, I don't think it merits release of 3.11.1, so for 2024-12 the fix will only be on master to improve ongoing testing.)

eclipse-qvto / org.eclipse.qvto

Synchronize metamodel evolution across threads. #1126

Description