eclipse-omr / omr

Eclipse OMR™ Cross platform components for building reliable, high performance language runtimes
http://www.eclipse.org/omr
Other
949 stars 396 forks source link

JitBuilder should operate at a higher level of abstraction than OMR IL #3665

Open dibyendumajumdar opened 5 years ago

dibyendumajumdar commented 5 years ago

At present in some ways JitBuilder is a thin wrapper over OMR IL. Since the OMR operates at a low level of abstraction where data is made up of primitive types, arrays and pointers, operating at this level of abstraction means that JitBuilder cannot perform richer type checking of operations.

The proposal here is that JitBuilder should operate on a higher level IL - maybe a linear IL, not necessarily SSA. More importantly, JitBuilder values should have richer type information, at least on par with LLVM, but maybe even richer. By defining an intermediate IL / Value system, the JitBuilder api will become more formally defined too.

Note that in some ways JitBuilder already provides a higher level api as well as Type Dictionary - and tries to abstract away some of the OMR IL details; however this would be more systematic with the adoption of a higher level IL / Value / type system.

The JitBuilder should operate in two phases. In the first phase the user will construct the JitBuilder IL using the JitBuilder api. During this phase richer type information will be available, thereby enabling strict type checking of operations such as assignments.

Once the JitBuilder IL is built this way, the user's role will be over. JitBuilder will be free to convert this IL to OMR IL whenever it is best to do so. This will be the second lowering phase.

The JitBuilder IL could take several forms. One suggestion is a Linear IL. Another might be an AST like approach as used by Truffle. Or it might be a byte code such as used by Java, but maybe register based rather than stack based.

One important consideration should be to model GC operations in the IL, as this could be a key differentiator for JitBuilder / OMR compared to other approaches.

Another important consideration is clearly defining the Value type system. Should this support aggregates such as Structs only (as done by LLVM) and leave further abstractions such as Classes, Interfaces, Methods etc to user level code? Or should this type system be rich like Java's? Perhaps the OpenJ9 type system could be used?

Concepts such as VirtualMachineOperandArray, VirtualMachineOperandStack, VirtualMachineRegister, VirtualMachineRegisterInStruct, and VirtualMachineState would not be needed I think, as the JitBuilder IL would subsume these concepts within its definition.

mstoodle commented 5 years ago

I have been doing some thinking along these lines recently, that are leading me towards building a compiler representation of the JitBuilder API calls. The idea would be to enable JitBuilder clients to introspect the operations they created and even to write their own analyses and augment with their own language level data. I guess it's similar to the MLIR idea being introduced for TensorFlow that builds on top of the LLVM infrastructure. The primarily motivations driving me towards this model are:

  1. we can do a better job with initial IL generation if JitBuilder doesn't have to create code until the MethodBuilder's buildIL() function has completed,
  2. we can provide more meaningful validation of the code that's been generated by the client and map it more meaningfully to the source code and concepts the client code works with directly (for example we could easily check that local variables are defined before being used and express the result back to the client),
  3. we can produce a more meaningful textual representation of the code that has been generated by client, which otherwise requires reading OMR compiler logs,
  4. enable a much more meaningful and rich mechanism for language compilers to perform analyses themselves that are language-specific without having to delve all the way into the OMR compiler technology.

I have put some of the language binding work I was doing on hold for a bit to flesh out these concepts in my head. I would prefer not to break the current API that clients are used to in order to bring this kind of facility to light and I think that's possible, but it may impact what we need to do in terms of language bindings.

When I have something a bit more concrete available, I'll put something together for people to comment on, as I think it will be easier to talk about a concrete proposal (and it will help me flesh out my thoughts on the subject). Of course, others are welcome to try the same and I'm happy to discuss.