eclipse-ocl / org.eclipse.ocl

Eclipse Public License 2.0
0 stars 0 forks source link

[pivot] Optimization #2078

Open eclipse-ocl-bot opened 2 months ago

eclipse-ocl-bot commented 2 months ago

| --- | --- | | Bugzilla Link | 549476 | | Status | NEW | | Importance | P3 normal | | Reported | Jul 23, 2019 04:20 EDT | | Modified | Jun 02, 2020 15:41 EDT | | Depends on | 549480, 549482 | | See also | 563865 | | Reporter | Ed Willink |

Description

Sina Madani's example described in an email:


The projects are located in https://github.com/epsilonlabs/parallel-erl-temp/blob/master/standalone. The models can be found in https://drive.google.com/drive/folders/1xEIRyDwbPkrG5fqBTedVsLYsi4avGNIU (look for ones starting with imdb-). To run the compiled query, invoke the main method of https://github.com/epsilonlabs/parallel-erl-temp/blob/master/standalone/org.eclipse.ocl.standalone.imdb_select/src/movies/launch/ImdbSelectLauncher.java with the path to the model file. Also add the -profile flag to the arguments to get execution times.

To run the interpreted version, use https://github.com/epsilonlabs/parallel-erl-temp/blob/master/standalone/org.eclipse.ocl.standalone/src/org/eclipse/ocl/standalone/StandaloneOclBuilder.java, populating the model, metamodel and script parameters as well as calling asQuery() and withProfiling(). The metamodel and script can be found in https://github.com/epsilonlabs/parallel-erl-temp/tree/master/evaluation/imdb (the script is imdb_select.ocl).

Also the result for interpreted OCL can be found in my presentation on slide 26 in https://www.slideshare.net/SinaMadani/parallel-queries.


NB parallel-erl-temp replaces the earlier parallel-erl and it uses Epsilon 1.6 inrerim.


The original results were disappointing since only the interpreted OCL is shown.

Using the CG improves performance on imdb-0.2 by 5.5 times. Much better, but what is going on?

There is an overall iteration on 172519 elements that gives an ambitious target execution of 0.17s rather 59s.

The faster allInstances() does not seem to be in use, but only a 1 second cost.

Inner routines are called 40 times per iteration with 200 OrderedSetImpl creations per iteration with a redundant conversion to a boxed representation each time.

Surprisingly/encouragingly changing the model to unordered rather than ordered makes no significant difference.

Replacing the inner OrderedSetImpl creations and associated trivial excludeAll/size logic by direct use of EList gives a 10 second, 15% improvement. Useful but not the whole problem.

The inner areCouple and areCoupleCoactors are each invoked 7717360 times, so in comparison to the ambitious 1,000,000 elements per second target, 15,000,000 query calls in 58 seconds is only a bit slow.

Nested calls require nested Executor discovery from this. If the executor is passed between queries (cannot find the outstanding CG bug) the speed improves by 5 seconds (5%).

(Performance of Pivot OCL interpreter, but that has never been a real optimizatim target since the CG is better.)

Anyway, the example seems like some plausible code that we should do better on.

eclipse-ocl-bot commented 2 months ago

By Ed Willink on Jul 23, 2019 05:43

Using YourKit points out some even lower hanging fruit.

Comparison of values uses the generic OclComparable, which locates the common type as part of its dispatch. This is needlessly costly for boring integers so replacing the two calls to ComparisonOperation by the .intValue() equivalents saves over half of the execution time. 58s => 21s.

For interpreted execution it may be hard to bypass, but the CG can definitely know what the dynamic types are and see that there is a trivial comparison available for inlining. Bug 549480 raised.

All three optimizations and we are down to 11s; 60 times faster than interpreted. Better than 1,000,000 OCL operation calls per second.

eclipse-ocl-bot commented 2 months ago

By Ed Willink on Jul 23, 2019 05:54

The not-inlined Boolean operation calls seem like another pain point. Bug 549482 raised.