[synthesis] Introduce a CodeSynthesis 'Visitor'

eclipse-ocl-bot commented 1 month ago

| --- | --- | | Bugzilla Link | 279638 | | Status | CLOSED FIXED | | Importance | P3 enhancement | | Reported | Jun 09, 2009 11:29 EDT | | Modified | May 20, 2013 11:35 EDT | | Version | 2.0.0 | | Depends on | 358570 | | Blocks | 318358 | | Reporter | Ed Willink |

Description

For Eclipse, it would be good to be able to generate good quality Java code from an OCL expression; at present the OCL expression compiler just caches an interpretation sequence.

For other purposes, it would be good to be able to generate good quality C or VHDL code from an OCL expression.

For simple expressions the main challenge is handling the diversity of type expressions. For moderately complex expressions considerable strategic planning may be necessary. For arbitrarily complex expressions there may be no reasonable code generation approach.

I think that a direct one-pass OCL to code synthesis approach via a counterpart of the EvaluationVisitor will not be suitable for general approaches. I suspect that we need a multi-stage transformation in which the really powerful OCL constructs are progressively transformed to a moderately simple set of concepts that are then optimised and then finally fed to a model-2-text transformation.

These transformations should allow extension so that additional stages can cover even more complex declarative issues, or derivations can support extended syntaxes such as QVT's ImperativeOCL.

Any ideas/suggestions/contributions welcome.

eclipse-ocl-bot commented 1 month ago

By Alexander Igdalov on Jun 09, 2009 16:32

The legacy implementation of Operational QVT in the Borland Together product combined both approaches - interpreting OCL/QVT and compilation of OCL/QVT to Java and then running it. However, compilation to Java has always been a pain in the neck due to several reasons such as:

The generated Java-code was hardly human readable. This was to a large extent influenced by the next item.
The framework proposed by Kent OCL which was used there required generation of some text for each AST element. As a result, developers had to invent complicated workarounds to produce Java code which was at least valid. In this way I agree with Ed's idea that "a direct one-pass OCL to code synthesis approach via a counterpart of the EvaluationVisitor will not be suitable for general approaches".
Debugger support in generated Java code was not implemented due to its technical complexity. By debugger support I mean QVT debugging, i.e. tracing the QVT script, not the generated Java code. As you might know, Borland Together supports a full-scale JDT-like debugger for M2M QVTO. It also has a similar debugger for legacy QVT but only when the interpreter mode is used.
Developers had to keep the interpreter and the compiler synchronized - so that both approaches produce the same results.

All that is written above doesn't mean that I object to code generation. We must consider all pros and cons of the generative approach and in case we decide to implement it we should take into account the lessons learnt from the previous implementations.

However, the first question is to decide why we need it. It shouldn't be just a wow effect since it requires much effort. Currently I see the performance reason since running generated Java code is supposed to be much faster than interpreting OCL expressions. AFAIR, investigations at Xalan XSL engine revealed that XSL-to-bytecode transformations were about 3 times faster than the corresponding interpreted ones. In case we have use cases that require significantly improved performance - that may be +1 to implementing generative approach.

For other purposes, it would be good to be able to generate good quality C or VHDL code from an OCL expression.

Ed, do you know such use cases? Regarding VHDL, OCL used in electronics would be a great practical application of our component.

My point is that generative approach has pros and cons. In case we have enough use cases when the interpreter is not satisfactory we can move on in this direction.

eclipse-ocl-bot commented 1 month ago

By Ed Willink on Jun 09, 2009 17:28

C or VHDL. Yes. There is a possibility that I might be able to augment my own-time contributions with some Thales-time here. We require a single specification language for bit-true functionality executing in Java or C/C++ or VHDL.

eclipse-ocl-bot commented 1 month ago

By Ed Willink on Aug 14, 2009 02:27

I'm able to start looking at this briefly.

The multi-pass preparatory conversion to simpler code aligns with a different very useful form of code synthesis; generation of OCL code in OCL.

If OCL code is synthesised for an OCL expression, the environment of the synthesiser may provide constant values for some OCL variables allowing a constant folding optimisation to be performed.

This is very useful for 'silicon' synthesis where collection sizes are fixed and so iterations can be unrolled.

So I think that the synthesiser is a succession of OCL to simpler-OCL transformations followed by a final simple-OCL to target language code generators.

It would be nice to use an M2M language for the OCL to OCL, but that would give OCL an M2M dependency, and I'm not sure that any of the M2M languages are that efficient at present, so I'm inclined to code in Java but use an extension point to register available Java callable OCL to simpler-OCL transformations.

eclipse-ocl-bot commented 1 month ago

By Ed Willink on Feb 17, 2010 02:50

The new invocation/setting/validate delegates provide an 'internal' use case for this. The often very simple OCL expression could be cached as a compiled Java method rather than interpreted AST. It just requires that the cached 'expression' is an Object rather than a String allowing String, OCLExpression or 'CompiledOCLExpression' realisation.

eclipse-ocl-bot commented 1 month ago

By Ed Willink on Apr 03, 2010 05:09

Having posted this as a Google Summer of Code idea, I've given some more consideration to it.

I see 6 areas of work

Framework and integration\

The generated code needs to form easily managed modules, perhaps emulating the existing EMF Validation structure so that EMF directly code generates the enhanced implementation rather than the OCL query setup, parsing, analysis and evaluation. For reflective purposes the setting/invocation/validation delegates should cache the enhanced implementation for re-use.

However, in practice many invariants are layered so that evaluating all 10 invariants on a class independently can be redundant. Better to evaluate all 10 at once, but this requires a better than true/false return mechanism; perhaps an integer error code. One suggestion has been that in the concrete OCL syntax an invariant name be followed by a parenthesised expression defining a String diagnostic. Perhaps a higher level interface is null for ok and a String explaining the failure.

This area probably requires two transformations with an extensible library of variations. Firstly an M2M to impose the required package/class/method structure on the constraints. Secondly an M2T to provide the final output.

An important related area of work is debug support. As a minimum the M2T should be pretty-printed and commented. Variable names should endeavour to be helpful. Integration with JDT is important.

Scalar type analysis\

OCL has unlimited precision so BigDecimal and BigInteger are used in MDT/OCL for safety in the absence of type analysis.

For Java targets, identifying that int, double or Long can be used safely will be beneficial.

For VHDL/silicon targets identifying the minimum necessary precision e.g. four integer bits and 10 fractional bits is necessary. Also rounding and overflow behaviours need planning.

The necessary constraints for type analysis may often be lacking, so it may be necessary to propose a disciplined set of invariants on data values to provide something for this analysis to work on. In the case of loop counters, it may be appropriate to have an environment-wide configuration on the maximum collection size, thereby bounding many integer values.

Potentially a three phase M2M: identify local solution bounds, propagate bounds globally, convert bounds to types.

Collection type analysis\

MDT/OCL currently uses four different collection implementations, with the OrderedSet and Bag being particularly suspect in terms of efficiency.

Size analysis may allow fixed size or at least upper-bounded size rather than dynamic collections to be code generated. Non-dynamic collections are particularly important for VHDL/silicon targetting.

Usage analysis may detect that Sets are so small that the hashing costs outweigh the costs of list iteration.

Potentially a three phase M2M: identify local solution bounds, propagate bounds globally, convert bounds to types.

Basic Code generation\

There are an infinite variety of types but a more finite (about 20) number of mathematical operations such as (+,*,xor,>=). Binary operations are generally three-typed, two inputs and an output since particularly for multiply and divide the result type is significantly different from the inputs. Code generation should therefore invoke the output type to just move values between input variables that may need conversion and an output variable that is correctly typed. Extension to support e.g. ComplexTypes then happens easily through extra Type code.

Perhaps a single M2M.

Advanced Code generation\

Most (?all) of the more powerful concepts involve iterate and simple operations, so it should be possible to define a single normalisation from library functions, allowing easy library extension without needing code synthesis changes.

Perhaps a single M2M or even direct Java model creation.

Loop analysis\

Many practical declarative constraints are partially redundant. For instance both max() and min() may be invoked for a Collection involving a double iteration, when both could have been derived in a single loop. Redundant loop iteration domains can be merged. Similarly, the declarative collections often involve the calculation f(g(h(i(j(collection)))) with each intermediate fully calculated before the next. It will often be more efficient to perform the element-wise f,g,h,i,j calculation within a single outer loop.

Each symbolic analysis may merit distinct M2Ms.

A first implementation may largely ignore Type Analysis and Loop Analysis using just the four Collection types and BigDecimal/BigInteger. A useable framework and basic code generation should at least give better performance for simple scalar constraints. Introduction of the advanced code generation normalisations should ensure better performance for all constraints. Then scalar and collection type analysis can give further benefits with extensibility to more finely grained silicon type systems. Finally loop analysis of ever increasing symbolic complexity can give better and better performance. This loop analysis may also provide enhanced semantic diagnosis by determining inconsistencies (errors) and redundancies (warnings) in sets of invaraints.

Development of the more sophisticated analyses may have self-enhancing motivations. For instance the output model of the Collection Type Analysis may involve a variety of OCL expressions evaluated on the input model. This may be expressed using the embedded OCL of an M2M language. Provided the M2M language invokes operations defined on an Ecore model, the genmodelled or reflective invocation delegates should transparently invoke the enhanced MDT/OCL functionality.

eclipse-ocl-bot commented 1 month ago

By Axel Uhl on Mar 10, 2011 12:29

BTW, has anyone done any credible performance analysis for the current OCL evaluation visitor? We have made some very basic wall-clock observations and found that particularly combinations of allInstances()->select(...) are costly. We therefore had a student work on a mapping of such subexpressions to the query2 framework. However, as query2 is to some degree still in its infancy we still have no reliable benchmark results as yet.

However, before we invest heavily into compiling OCL ASTs to Java source or byte code, we'd probably apply a profiler to see where we're losing the most.

Best,\ -- Axel

eclipse-ocl-bot commented 1 month ago

By Ed Willink on Mar 10, 2011 13:14

(In reply to comment #6)

BTW, has anyone done any credible performance analysis for the current OCL evaluation visitor?

No. But Dresden OCL measured Eclipse OCL as 4 times faster. I'm pretty sure this was becuase they EcoreSwitch and we visit.

Intuitively, since most nodes are OperationCallExp, the current evaluator suffers from a very long decision tree. The Pivot evaluator was much faster, till I introduced dynamic dispatch. I'm looking to recover much of the speed loss that CompleteOCL incurs in the very common use case that there is no CompleteOCL.

Fundamentally, code gen must be a dramatic win on bread and butter operations.

a) discovering a 'final' operation can eliminate dynamic dispatch.\ b) propagating non-null-ness can eliminate many guards\ c) doing a Java multiply/add/... rather than a interpreter dispatch must be good\ d) determining that 32 bit int rather than BigInteger will do must be good

I expect 10 to 100 times enhancement.

But if you or a student wants to provide some tests that would certainly help guide what we do.

allInstances() is a model msnagement rather than code gen problem.

eclipse-ocl-bot commented 1 month ago

By Ed Willink on Apr 05, 2011 11:53

There is an intermediate form of CodeGen we can do much more easily.

The new evaluator has a class per feature that in many cases just delegates to a polymorphic Value method. If more functionality is placed in the Value methods we could have an AST Tree Visitor that generates code to directly invoke the Value methods.

This will:

have full Value semantics (BigInteger etc)
reuse lazy/smart CollectionValue implementations
significantly reduce dispatching overhead
eliminate a first time OCL parse cost

This will not:

optimize e.g. IntegerValue to use int when possible
analyze away impossible null or invalid decision paths
allow additional overload definition post code gen

I think this is just what we want for code generation as part of genmodel. We save the parse cost/inconvenience and perhaps get a two to five fold speed increase.

eclipse-ocl-bot commented 1 month ago

By Ed Willink on May 25, 2011 16:41

A further use case. Provide Java subroutines for expressions so that actions and transitions for UMKL State machines can be invoked by a state machine synthesis application.

eclipse-ocl-bot commented 1 month ago

By Ed Willink on Jun 18, 2011 12:30

bug/279638 now has an Accceleo transformation generateJava.mtl that converts the OCL in a .oclinecore file to nested Operations classes. Just launch the transformation with a *.oclinecore as input, as specify the directory for Java files as output.

eclipse-ocl-bot commented 1 month ago

By Ed Willink on Sep 16, 2011 13:37

Work in progress is in the bug/349962 branch.

eclipse-ocl-bot commented 1 month ago

By Ed Willink on Oct 04, 2011 07:40

bug/349962 branch now supports integration with genmodel.

Ir requires attachment 204469 to provide the support for EStructuralFeature get annotations.

eclipse-ocl-bot commented 1 month ago

By Ed Willink on Nov 07, 2011 16:49

The basic code generator has been pushed to master.

Further optimisations are possible and can be dealt with by new Bugzillas.

eclipse-ocl-bot commented 1 month ago

By Ed Willink on May 20, 2013 11:35

CLOSED after a year in the RESOLVED state.

eclipse-ocl / org.eclipse.ocl