eclipse / epsilon

Epsilon is a family of Java-based scripting languages for automating common model-based software engineering tasks, such as code generation, model-to-model transformation and model validation, that work out of the box with EMF (including Xtext and Sirius), UML (including Cameo/MagicDraw), Simulink, XML and other types of models.
https://eclipse.org/epsilon
Eclipse Public License 2.0
55 stars 11 forks source link

Unexpected performance hit when switching from EmfModel to InMemoryEmfModel #45

Open agarciadom opened 1 year ago

agarciadom commented 1 year ago

Working on the TTC KMEHR to FHIR case today, I noticed that the benchmark driver in its reference solution, transforms a File into a Resource, rather than a File to a File. This is done so the "Run" phase of the measurements does not include the time used in saving the model.

To make results comparable, I decided to make the same change, and have the ETL transformation go from an EmfModel to an InMemoryEmfModel. When I did that, however, I noticed it significantly slowed down. VisualVM points to the maintenance of the allContents cache:

image

This wasn't an issue with EmfModel. It turns out that at some point, I added some code to register CachedContentsAdapters automatically in the initialisation of InMemoryEmfModel. I wonder why I didn't check whether caching was enabled or not at that time - I can't remember at the moment.

Later on, Sina changed the code to just use setCachingEnabled(true), which performs the same work but is also consistent with the cached flag. This is after a commit where he fixed EmfModel::setCachingEnabled to add/remove the CachedContentsAdapter itself (as it should have).

Looking at this again, I wonder if we should drop this altogether from InMemoryEmfModel, and just let users decide if they want to turn on caching or not by themselves:

  // Since 1.6, having CachedContentsAdapter implies cached=true, otherwise it's inconsistent.
  setCachingEnabled(true);
arcanefoam commented 8 months ago

I think the user should always decide. If not, users will see/perceive the performance/memory hit and be confused if they are not selecting the cached option.