godotengine / godot-proposals

Godot Improvement Proposals (GIPs)
MIT License
1.16k stars 97 forks source link

Implement infrastructure for second pass in the new GDScript compiler #1176

Open pchasco opened 4 years ago

pchasco commented 4 years ago

Describe the project you are working on: I am not working on a specific project, however I am very interested in the performance of the GDScript compiler and execution engine.

Describe the problem or limitation you are having in your project: The current implementation of GDScript, and the new "GDScript 2.0" does not appear to have an underlying architecture to support any optimization passes after bytecode generation. (GDScript 2.0 assumption based on https://github.com/godotengine/godot/pull/39093 ) Many optimizations are only possible or much easier performed after the code generation step in a compiler.

Describe the feature / enhancement and how it helps to overcome the problem or limitation: I suggest structuring the GDScript compiler in such a manner that transformation and optimization passes can be inserted into the series of steps that are performed to generate the final bytecode output of the GDScript compiler.

This extensible architecture will enable a number of optimizations to be implemented that are currently impossible or very difficult to achieve working only on the AST. Indeed, most modern compilers do the bulk of their optimization on structures generated from bytecode or some abstract intermediate instruction representation rather than any AST structure.

Many of the following high-impact optimizations could be implemented:

Describe how your proposal will work, with code, pseudocode, mockups, and/or diagrams: The proposal requires the GDScript compiler to support an arbitrary number of passes to be applied to the GDScript bytecode. Each pass would receive the metadata and bytecode generated by the previous step, then apply some transformation to the bytecode and/or metadata. The results of that pass would then be fed to the next pass, or in the case that it is the final pass in the set of passes, the results would be the final results fed to the GDScript execution engine. This is a common design pattern found in most modern compilers.

I have begun to implement a sample implementation of these proposed changes from the 3.X branch of Godot, but stopped when I learned that GDScript was being rewritten.

If this enhancement will not be used often, can it be worked around with a few lines of script?: This could not be worked around with script.

Is there a reason why this should be core and not an add-on in the asset library?: This must be implemented as a part of the GDScript module.

jonbonazza commented 4 years ago

I'm all for further optimization of GDScript. Perhaps this should be put in front of @vnen as he is actively working on GDScript 2.0.

Ansraer commented 4 years ago

I had just assumed that multiple passes were a given, since I never even considered the possibility of doing any serious optimization work without them. Thank you for noticing this and creating a issue while GDScript is being rewritten. Really hope we can get vnen to weigh in on this.

Out of interest, do you know what kind of data structure is currently used to store the byte code generated by the compiler? The last time I worked on a compiler we mainly used trees and tuples during optimization iirc. Does GDScript immediately flatten whatever it gets into a sequence?

CedNaru commented 4 years ago

Nice Idea. That would let users propose independent optimization passes without reworking the rest of the Gdscript module. It makes maintenance, updates and benchmarks easier to do as well.

pchasco commented 4 years ago

Out of interest, do you know what kind of data structure is currently used to store the byte code generated by the compiler?

The current GDScript implementation stores the bytecode as a sequence of integers in a structure along with other metadata.

The last time I worked on a compiler we mainly used trees and tuples during optimization iirc. Does GDScript immediately flatten whatever it gets into a sequence?

Yes. The AST is walked to produce a flattened sequence of ints representing the bytecode. In my experience, the bytecode generated is very inefficient. A lot of cycles are wasted with unnecessary unconditional jumps. This is a side effect of generating code directly from an AST structure. This is normal, but most compilers will take this output and create a control flow graph and dominance tree and reorder the basic blocks to reduce branching. Native code compilers do this to avoid the unnecessary branch instruction and to optimize instruction cache utilization. I don't think we'd see any instruction cache benefit here but we would see a reduction in the number of instructions executed.

My blog entry from February 28 of this year illustrates some of the benefits of these optimizations against the GDScript bytecode. http://blog.moblcade.com/?p=114

vnen commented 4 years ago

The current plan is to restructure the instructions set in the GDScript VM to make use of types and faster addressing modes, which will increase the speed of execution. I will start working on that as soon as I wrap up the current refactor into something with feature parity over the old code (which should not take long at this point).

Once we have the new VM we can consider integrating something like this to optimize user code and further improve runtime speed.

If you are willing to make the code for this I would definitely accept it. I don't have plans to change the compiler further anytime soon, besides adapting it to the new instruction set. I do believe that most of the code you wrote is still usable and not difficult port to 4.0 (though I admit I haven't done more than a glance at it). I would advise avoiding doing any serious work before the new code is merged though (which hopefully won't take too long from now).

vnen commented 4 years ago

BTW, you should still detail more the implementation proposal. The optimizations are welcome but we should agree on implementation so you don't waste time making it in a way that goes against the Godot standards.

One idea that Juan had is to add an API to the backend for code generation, so the compiler just calls that API from the parse tree. Then the backend could be replaced just by implementing this interface (be it GDScript bytecode, other IR form, GDNative C code, LLVM IR, or something else).

pchasco commented 4 years ago

Sure, I can do that.

vnen commented 4 years ago

One idea that Juan had is to add an API to the backend for code generation, so the compiler just calls that API from the parse tree. Then the backend could be replaced just by implementing this interface (be it GDScript bytecode, other IR form, GDNative C code, LLVM IR, or something else).

So this API is now done here: https://github.com/godotengine/godot/pull/41338. I think that it helps a little since you can make this as an implementation of that interface, dealing with a lower level than the parse tree, but before it becomes bytecode.

pchasco commented 4 years ago

Looking at godotengine/godot#41338 briefly, I do think it will fill two needs for implementing infrastructure for a second pass.

First, it will eliminate the bytecode disassembly step where a second pass needs to take the raw bytecode and disassemble it into a more abstract representation for analysis. Instead, an implementation of the GDScriptAbstractCodeGenerator would emit a representation of the program in another intermediate structure.

Second, the GDScriptCodeGenerator class could be relied upon by the second pass to generate the bytecode from the abstract representation that it used for analysis as its final step. So essentially the second pass would just be a man-in-the-middle between the GDScriptCompiler and the GDScriptCodeGenerator.

I have just within the last few days completed a separate project, so I ought to have time to add more detail to the proposal this week.