eclipse / epsilon

Epsilon is a family of Java-based scripting languages for automating common model-based software engineering tasks, such as code generation, model-to-model transformation and model validation, that work out of the box with EMF (including Xtext and Sirius), UML (including Cameo/MagicDraw), Simulink, XML and other types of models.
https://eclipse.org/epsilon
Eclipse Public License 2.0
55 stars 11 forks source link

Improve performance in ECL MatchTrace #40

Closed agarciadom closed 1 year ago

agarciadom commented 1 year ago

Summary

This PR reimplements MatchTrace so it is based on lookup rather than storing flat lists of matches. The data structure assumes that each (left, right) pair can only have one associated Match: this seems to be the assumption from looking at the getMatch(Object left, Object right) method.

Details

It uses two 2-level IdentityHashMaps (left -> right -> Match, and right -> left -> Match). This allows it to quickly find matches for an object whether it participated on the left or right side, while also efficiently looping through all Match objects by only traversing the left -> right -> Match map. I have reimplemented the iterator() and stream() methods with new iterator and spliterator implementations, which are based on iterators and spliterators over the underlying maps.

This PR adds a MatchTraceTest suite specifically for this data structure, which attempts to cover as much as possible from the data structure: the ECL acceptance tests only covered a small part of the MatchTrace methods. It was passing all regular tests in my Eclipse instance, but we should wait to see what Jenkins thinks :-).

Performance

In terms of performance, I have noticed a significant improvement in the TTC 2023 Containers to MiniYAML example if you stop using guards and only use straight comparisons.

Guard-based ECL script

In the current version of the reference implementation of this case, I had to use guards instead of compare expressions (leaving them to true) as otherwise the execution would take too long.

With the list-based MatchRule and guard expressions, limiting the generated models to 50 containers, I get an execution time of 8.931s in the incremental forward case (which uses ECL+EML for merging):

------------------
Incr. FWD:
------------------
n_containers,n_volumes,n_images,Epsilon
50,3,4,8.931

With the map-based MatchRule and guard expressions, there is not much change:

------------------
Incr. FWD:
------------------
n_containers,n_volumes,n_images,Epsilon
50,3,4,8.921

I checked with VisualVM and the execution time for those two configurations is dominated by finding the appropriate method to invoke via reflection, by the way:

image

Guard-less ECL script

As said above, the list-based MatchRule cannot work with only compare expressions: I've been waiting for over 5 minutes for it to finish.

However, with the map-based MatchRule and only using compare expressions (without guards), execution times actually drop below those of the guard-based ECL script (to 3.86s):

------------------
Incr. FWD:
------------------
n_containers,n_volumes,n_images,Epsilon
50,3,4,3.866

It looks like this scales much better when you have a significant number of (left, right) pairs to deal with. At the same time, I wonder what may be happening with the guards, as using them can actually slow down an ECL script. That's for another PR, I guess.

agarciadom commented 1 year ago

Jenkins is green:

https://ci.eclipse.org/epsilon/job/interim-kubernetes/job/39-improve-performance-in-ecl-matchtrace/

@kolovos could you give it a look and let me know your thoughts?

agarciadom commented 1 year ago

It all looks good to me. In the future we may want to change the behaviour of matches() to return true if there is at least one successful match and false otherwise (instead of just considering the first match), but I don't think this falls in the scope of this PR.

I was actually nearly done with a version that allows for multiple Match objects with the same (left, right) pair, so I thought I might as well finish it. The new version passes all the previous tests, plus one new test for recording multiple matches for the same pair.

There was a performance hit (Incr FWD went up to 5.5s), but it's still much faster than what we had before (which wouldn't even complete in a reasonable amount of time).

agarciadom commented 1 year ago

Incidentally, the produced performance numbers for the old behaviour finally came out!

Old data structure (flat lists):

------------------
Incr. FWD:
------------------
n_containers,n_volumes,n_images,Epsilon
50,3,4,518.804

New data structure (maps of maps of sets):

------------------
Incr. FWD:
------------------
n_containers,n_volumes,n_images,Epsilon
50,3,4,5.4