Closed EBatTiVo closed 4 years ago
Is this a duplicate of TiVo/Intellij-haxe#9 ?
No, this issue is about creating the backend AST and PSI (logical structure of the AST) that all of the IDE and plug-in code works with. #9 is about the completion engine, which /can/ be (and currently is) driven by code reading the AST, but the Haxe compiler itself has been extended to generate completion offers which are likely to be better than those that the IDE can offer. The issues are similar in that they both propose using the Haxe compiler, and dissimilar in that they propose to use it for different things.
I don't understand what exactly you want to do here. It should be clarified that even with macros you can not use arbitrary syntax. Macros can only consume and produce valid Haxe syntax, so it is still possible to describe the grammar in BNF.
Macros can only influence how syntax is interpreted and this shouldn't be something you have to worry about outside of providing completion, which is to be addressed in #9.
To answer your question, what we are trying to do here is twofold: get out of the business of rewriting and maintaining parsers and lexers; and get a lot more information about the code being written.
Completion is NOT the only thing we need to accommodate. We need to mark syntax errors, highlight/colorize code, ensure that libraries are included in the project (yes, for completion, but also so that we can auto-generate and run unit tests). We want to create 'lint'-like processing to be proactive with highlighting errors. We badly need to deal with pre-processing symbols (#if/#elsif/#end) in some way other than putting an optional token between every word in the BNF. We need to find call locations (by type!) and understand the use of calling functions through variables. It would also be nice to determine if and where functions -- or effectively new types -- are defined using macros, even to the point of give a pre-processed representation of generated code so that the user can better debug their macros. Basically anything that can be known at compile time can be used by the IDE to better the coding experience. We want to make the coding experience for Haxe great!
After all of that, it's also clear that the language is a moving target, and keeping up with new functionality is costly. If we can have the compiler tell the IDE what the output of compilation is, in terms the IDE can understand, then the compiler becomes an integral part of the code writing process, and not just something you call when you think the code is ready.
Plus, why, oh why, should we be rewriting a language parser via Flex when a perfectly reasonable and well-maintained one already exists in the compiler?
For the record, what I'm championing is not to complicate the compiler, but rather build a new language "back-end" plug-in where the target output is not really a new language (e.g. javascript, c++, etc.), but a machine-readable AST along with some contextual information. Then the Intellij plug-in will read that output and translate it into the IDE's internal PSI and node types.
This is not going to work for several reasons. For one our own parser simply discards tokens during pre-processing so anything within an inactive conditional compilation is never seen by later stages. The presence of conditional compilation also means that an AST only exists for a given path. There is no structure in the compiler which represents a .hx file as a whole.
If you want to go linting it has to be done based on tokens, not expressions. We could expose Haxe's lexer for that, but the real work has to be done after that. We couldn't really utilize any of the information Haxe collects during typing because, again, we can only type a given compilation path.
Well, that's disappointing. I guess we'll have to mess with the compiler some, then. I haven't had the time to actually read the Haxe compiler code in detail or understand its architecture yet. That's coming, but not in the very short term.
Obviously, the IDE has different needs, in order to support some of the things we want to. However, the compiler can certainly supply some of the functionality that we need, and we plug-in folks badly need to get out of the (so far, very expensive) business of duplicating the parser and language semantics. We have way too many bugs that have everything to do with parsing, and not much to do with additional functionality. I'm hoping there's a way to eliminate that entire class of bugs.
As far as conditional compilation paths go (apart from parsing), that's a whole 'nother can of worms. We basically have competing requirements: the need to parse and highlight syntax, on the current path (or many paths simultaneously?); and the need to deal with all paths during completion, refactoring, detecting used/unused imports, and that like. Maybe those issues don't have to compete, but with the current implementation they do.
I'm definitely open to suggestions. I don't yet have enough context to design a proper solution; just enough to enumerate [some of] the problems.
BTW, another reason to get the compiler to do the parsing/AST generation job is Static Extensions, which play all hell with expression typing, find usages, completion, and the like.
The compiler already has display modes for find usages and completion. What else do you need?
We can add additional output if it helps you, I just have to understand what exactly you want.
Hi. Just an idea you might want to consider: there's ocaml-java http://www.ocamljava.org/index.html which might help to reduce code duplication.
There has been movement in this area: The Haxe compiler team has created a new project called hxparser (https://github.com/vshaxe/hxparser) which is designed specifically for this purpose. The intent is to integrate it into the plugin. Whether that is possible via an embedded OCaml process or it must remain as an external tool is still an open question.
Not gonna happen unless we get a new parser with Haxe _some_n_of_thefuture. The OCaml parser doesn't know how to do this, and the Menhir version in hxparser had its own problems and has been mothballed.
Haxe is a constantly evolving language, gaining several new features each year. In addition, the language cannot be precisely described using the Backus Normal Form (BNF) because it has a rich macro processing capability which effectively changes the semantics of the language locally. That is: the language can be redefined contextually and on-the-fly by the engineers using it. The currently implemented plug-in has no such contextual polymorphism and is implemented using a fairly rigid BNF syntax that can be (is) passed through a code generation process to create both the lexer and parser portions of code. Then, nontrivial functionality is added to it. Basically, the language is baked in one way, and if you use any of the advanced features, you are out of luck when it comes to IDE support. Since Haxe already has a compiler that does understand the language flexibility, and the language is designed to be cross-compiled into other languages, it makes sense that the compiler can be extended to build an abstract syntax tree (AST, which is what a ‘PSI’ is in the IDEA environment) that the IDE can consume. In fact, the Haxe implementers have already built a similar functionality so that IDEs can support completion (the ability for a portion of text to be typed and the IDE can propose matches, which reduces keystrokes required by developers and reduces mis-typing errors). This task at hand is to create a mode of the compiler that will create an AST that can be translated into the target environment, then to create a specific translation into the IntelliJ IDEA environment.
TiVo internal reference: STB-2045