Implement new parser and code generator

dibyendumajumdar / ravi

Ravi is a dialect of Lua, featuring limited optional static typing, JIT and AOT compilers

http://ravilang.github.io/

Other

1.17k stars 64 forks source link

Implement new parser and code generator #98

Open dibyendumajumdar opened 8 years ago

dibyendumajumdar commented 8 years ago

A strength and a weakness of the Lua parser and code generator is that it is designed to generate code as it parses, on the fly. The process is extremely efficient and fast, and does not allocate heap memory at all.

The limitation of this approach is that more advanced code analysis is not possible, such as inlining of functions. A different parser that first generates an AST which is then converted to bytecode will allow more sophisticated code generation. But this will be slower than the existing parser and code generator so we need to retain both options ideally.

UPDATE (April 2020): This work has now moved to https://github.com/dibyendumajumdar/ravi-compiler. So the parser code will be deleted from this repo. The code will be preserved in a branch in case future reference is necessary

niaow commented 8 years ago

Or, we could turn the Lua bytecode into AST. This would only require changes to the JIT.

dibyendumajumdar commented 8 years ago

I suppose that should be possible but converting source to AST allows the language to be extended with additional constructs. In particular it would enable 'Metalua' type functionality.

niaow commented 8 years ago

One alternative I thought of: Switch back to normal lua bytecode format, and replace the static typing instructions with one instruction which forces a register to be a type - one argument would be a register, and another would be the type. This would allow for an extensible type system. An API using userdata could also be created to help generate bytecode.

This slightly extended Lua bytecode could then be translated into a low-level bytecode. It could be statically typed, and 64-bit. Dynamically typed values would be a type, and some low-level bytecode instructions could be used to check and convert it. This low level bytecode would essentially be like LLVM IR, but as a bytecode. Using bytecode linearizes the memory access and can usually make it fit into the L1 D-Cache. This bytecode could be formatted to serve the same purposes as AST.

dibyendumajumdar commented 8 years ago

I think working on the bytecode is inherently problematic - so many decisions are already made by then. As the code is compiled decisions are made regarding registers and how they will be used, how much stack space to allocate. Function calls result in additional decisions regarding how registers will be used to pass parameters and how return values will be processed. In my view trying to unpick all this is far too difficult.

dibyendumajumdar commented 8 years ago

Some notes to myself to remind : https://github.com/dibyendumajumdar/ravi/blob/master/readthedocs/new-ast-parser-code-gen.rst