eclipse-sirius / sirius-web

Reusable frontend and backend components for Sirius Web
https://eclipse.dev/sirius/sirius-web.html
Eclipse Public License 2.0
62 stars 45 forks source link

Only save the documents actually impacted by a change #3637

Open pcdavid opened 3 weeks ago

pcdavid commented 3 weeks ago

Currently on every mutation (technically any handler which publishes a SEMANTIC_CHANGE change description, see EditingContextEventProcessor.shouldPersistTheEditingContext(ChangeDescription)) we save all the documents (EMF resources) in the project. This can be long and is blocking (because EMF resources can not be safely used by multiple threads).

Ideally we would only save the "delta" of the change itself, and this might be possible but probably require a significant effort. A middle ground that seem to be worth investigating is to only save the sub-set of documents which are impacted by the change. Of course it would only be useful for some projects (with data split in multiple resources) and some changes (which do not recursively impact all/most of the resources).

EMF has native support for resource modification tracking (resource.setTrackingModification(true) on load, and then resource.isModified() to test the flag and resource.setModified(false) to reset it on save), so identifying which resources have been directly changed is trivial.

What's less trivial (but not that complicated) is that if resource A has been changed and resource B has not, the change in A might still have an impact on B's serialisation, if B contains a reference to at least one object in A whose URI has been modified by the change.

I'll try to prototype such an approach just to see if it's worth.

pcdavid commented 2 weeks ago

First (temporary) conclusion: the "optimization" does not gain much.

Computing the sub-set of resources that need to be saved is relatively slow (though I have some ideas to work around that), but most importantly, in the case of the "Sirius Web" Papaya example it does not gain much. For example, the "EObject" class in the "Eclipse EMF" resource, which should sit at the bottom of the dependencies and not be impacted by most changes contains an "Extended By" reference which makes it "dirty" as soon as any resource which defines a metamodel is touched. So in practice because of this kind or references accross all layers, when flipping a simple boolean flag inside a class from "Sirius Components - Papaya" marks almost all the resources as potentially impacted and thus saved.

pcdavid commented 2 weeks ago

Actually there were several issues with my first implementation. The "second round" version does show performance improvements, but I'm not quite sure it is correct. It might be fast(er) but actually broken.

pcdavid commented 2 weeks ago

With the "second round" version, I get down to about 270ms (from about 1000-1100ms before) when persisting the "Sirius Web" sample project after a single modification inside "Sirius Components - Tree":

image

In this case, all the time is now spent in SemanticDataUpddateService.updateDocuments:

image

But if the general approach (modulo some implementation bugs) is actually correct, the new Set<UUID> unmodifiedDocumentIds passed to that method, which is not used yet, could be exploited to also avoid writing the unmodified docs to the DB (not just avoid serializing them into JSON content):

pcdavid commented 1 week ago

A simpler version of this would be to avoid persisting anything when nothing has actually changed.

Currently when a tool is executed, it's handler has no way to know what semantic change it performed, or even if it actually performed a change. So all handlers publish a ChangeKind.SEMANTIC_CHANGE just to be safe. And then EditingContextPersistenceService unconditionally persists anything.

Even in normal functioning, there are cases where the user invokes a tool which does not actually perform any semantic change (for example the user request is invalid and rejected, or the tool's behavior does not actually need to make any change because the model is already in the expected state). In such a case we currently still persist the whole resource set.

If we enable EMF's resource modification tracking (and reset the isModified flag on actual save of course), EditingContextPersistenceService can safely bypass the whole persistence if it detects that no EMF resource is actually modified. This is very fast to test.

This would have a more limited impact than the approach outlined above, which is still relevant. But it seems like a safe "quick win" that can be done as a first step.

pcdavid commented 1 week ago

A simple scenario (that is not the norm but not unrealistic either) is a user who: selects a node on a diagram, hits F2 to enter direct edit, changes his mind and hits enter (instead of Esc) to accept the current label. In this case we execute the direct edit tool (maybe the front should filter that, but it's another matter), do nothing (in most cases; of course that will depend on the tool's implementation), and save everything anyway.