godotengine / godot-proposals

Godot Improvement Proposals (GIPs)
MIT License
1.12k stars 69 forks source link

Add `--script-dump-ast` command line argument to output an AST of a GDScript file #4958

Open Calinou opened 2 years ago

Calinou commented 2 years ago

Describe the project you are working on

The Godot editor :slightly_smiling_face:

Describe the problem or limitation you are having in your project

Third-party tools wishing to work with GDScript have to implement their own parser. This applies to linters, formatters and code analysis tools.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

We can avoid the requirement for third-party tools to reimplement a parser by allowing Godot to dump an AST (Abstract Syntax Tree) of any given script.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

Add a --script-dump-ast <path to script> command line argument that prints the AST to standard output and quits Godot afterwards. To prevent a window from being visible, --headless is implied when using --script-dump-ast.

This standard output can then be redirected to a file by the user to write the AST to a file. The output format should preferably be something standard and easy to parse by other tools, such as JSON.

If there is a need to dump multiple ASTs from a single engine run (to make dumping several scripts faster), we can allow multiple paths separated by spaces (so it works with globbing as well). In this case, each dump should be its own JSON document so they can be cleanly separated, and each dump should contain the path to the script (relative to the project root).

Example output from a Clang AST dump (clang-check -ast-dump test.c --extra-arg="-fno-color-diagnostics" --):

#include <stdio.h>
#include <stdlib.h>

int main() {
    printf("Hello world!");
    return EXIT_SUCCESS;
}
`-FunctionDecl 0x5647b82497d0 </home/hugo/test.c:4:1, line:7:1> line:4:5 main 'int ()'
  `-CompoundStmt 0x5647b82499d8 <col:12, line:7:1>
    |-CallExpr 0x5647b8249950 <line:5:2, col:23> 'int'
    | |-ImplicitCastExpr 0x5647b8249938 <col:2> 'int (*)(const char *, ...)' <FunctionToPointerDecay>
    | | `-DeclRefExpr 0x5647b8249870 <col:2> 'int (const char *, ...)' Function 0x5647b81d6fe8 'printf' 'int (const char *, ...)'
    | `-ImplicitCastExpr 0x5647b8249990 <col:9> 'const char *' <NoOp>
    |   `-ImplicitCastExpr 0x5647b8249978 <col:9> 'char *' <ArrayToPointerDecay>
    |     `-StringLiteral 0x5647b82498c8 <col:9> 'char[13]' lvalue "Hello world!"
    `-ReturnStmt 0x5647b82499c8 <line:6:2, /usr/include/stdlib.h:93:22>
      `-IntegerLiteral 0x5647b82499a8 <col:22> 'int' 0

If this enhancement will not be used often, can it be worked around with a few lines of script?

No.

Is there a reason why this should be core and not an add-on in the asset library?

This is about improving interoperability with third-party utilities not made with Godot.

wareya commented 2 years ago

Is there also a proposal for manipulating ASTs from game code? I currently need to do very, very evil things to gdscript code on a textual level in my cutscene script loading code, because my cutscene scripts are basically a gdscript-based DSL. If I had a way to manipulate an AST instead of text, even if it was dictly-typed, it would be a lot less fragile. Should I open a discussion for this?

Calinou commented 2 years ago

Is there also a proposal for manipulating ASTs from game code?

No, I consider this to be out of scope for this proposal. You can open a separate proposal/discussion for this, but this proposal is only about emitting an AST from Godot.

vnen commented 1 year ago

My concern with this is that the AST layout may change even between patch versions, there's no guarantee to keep compatibility since sometimes fixing a bug requires meddling with how the parser sees things. If that's not an issue then I guess it's fine to have the feature.

It also does need to have some unambiguous format. For instance, the GDScript parser currently has no node for grouping expressions (i.e. using parenthesis for controlling precedence), it just puts the nodes in a different order to give priorities. So the expression (2 * 2) + 2 has the exact same tree as 2 * 2 + 2. That is, if you plan to restore the typed code from the AST, this will fail (although it is semantically equal).

I'm not particularly against this, but I'm afraid it might be difficult to find some consistent that third-party tools could rely on. It would be interesting to know what those tools could be.

the-ssd commented 4 months ago

I would rather use a Godot implementation that might change, compared to maintaining my own parser.

The simplest way to deal with grouping is to put parenthesis on everything. So lhs op rhs will become (lhs) op (rhs). This will make the code unreadable, but it is simple.

A harder opinion is to check if lhs or rhs has a higher precedence than current node, and add parenthesis.

the-ssd commented 4 months ago

Also, it would be nice to know if this will be added