Closed agarciadom closed 1 year ago
Jenkins is green:
https://ci.eclipse.org/epsilon/job/interim-kubernetes/job/39-improve-performance-in-ecl-matchtrace/
@kolovos could you give it a look and let me know your thoughts?
It all looks good to me. In the future we may want to change the behaviour of matches() to return true if there is at least one successful match and false otherwise (instead of just considering the first match), but I don't think this falls in the scope of this PR.
I was actually nearly done with a version that allows for multiple Match
objects with the same (left, right) pair, so I thought I might as well finish it. The new version passes all the previous tests, plus one new test for recording multiple matches for the same pair.
There was a performance hit (Incr FWD went up to 5.5s), but it's still much faster than what we had before (which wouldn't even complete in a reasonable amount of time).
Incidentally, the produced performance numbers for the old behaviour finally came out!
Old data structure (flat lists):
------------------
Incr. FWD:
------------------
n_containers,n_volumes,n_images,Epsilon
50,3,4,518.804
New data structure (maps of maps of sets):
------------------
Incr. FWD:
------------------
n_containers,n_volumes,n_images,Epsilon
50,3,4,5.4
Summary
This PR reimplements
MatchTrace
so it is based on lookup rather than storing flat lists of matches. The data structure assumes that each (left, right) pair can only have one associatedMatch
: this seems to be the assumption from looking at thegetMatch(Object left, Object right)
method.Details
It uses two 2-level
IdentityHashMap
s (left -> right -> Match, and right -> left -> Match). This allows it to quickly find matches for an object whether it participated on the left or right side, while also efficiently looping through all Match objects by only traversing the left -> right -> Match map. I have reimplemented theiterator()
andstream()
methods with new iterator and spliterator implementations, which are based on iterators and spliterators over the underlying maps.This PR adds a
MatchTraceTest
suite specifically for this data structure, which attempts to cover as much as possible from the data structure: the ECL acceptance tests only covered a small part of theMatchTrace
methods. It was passing all regular tests in my Eclipse instance, but we should wait to see what Jenkins thinks :-).Performance
In terms of performance, I have noticed a significant improvement in the TTC 2023 Containers to MiniYAML example if you stop using guards and only use straight comparisons.
Guard-based ECL script
In the current version of the reference implementation of this case, I had to use
guard
s instead ofcompare
expressions (leaving them totrue
) as otherwise the execution would take too long.With the list-based MatchRule and
guard
expressions, limiting the generated models to 50 containers, I get an execution time of 8.931s in the incremental forward case (which uses ECL+EML for merging):With the map-based MatchRule and
guard
expressions, there is not much change:I checked with VisualVM and the execution time for those two configurations is dominated by finding the appropriate method to invoke via reflection, by the way:
Guard-less ECL script
As said above, the list-based MatchRule cannot work with only
compare
expressions: I've been waiting for over 5 minutes for it to finish.However, with the map-based MatchRule and only using
compare
expressions (without guards), execution times actually drop below those of the guard-based ECL script (to 3.86s):It looks like this scales much better when you have a significant number of (left, right) pairs to deal with. At the same time, I wonder what may be happening with the guards, as using them can actually slow down an ECL script. That's for another PR, I guess.