Isopod / tree-sitter-pascal

Treesitter grammar for Pascal and its dialects (Delphi, Freepascal)
MIT License
37 stars 14 forks source link

{$IFDEF} and {$IFNDEF} inside a type declaration causes incorrect node holding the rest of the text and macros #7

Open vintagedave opened 3 months ago

vintagedave commented 3 months ago

Hello,

Testing against the following Delphi code:

        unit Ifdefs;
        interface
        type
            {$IF DEFINED(MSWINDOWS)}IWindowsOnly = interface end;{$ELSE}ISomethingElse = interface end;{$ENDIF}
            {$IFDEF NOTDEFINED}
                INotDefined
            {$ELSE}
                IInverse
            {$ENDIF} = interface end;

I get some odd node results for the IFDEF part.

In other words, it doesn't seem to recognise the type name, and the $ELSE is lost. (I originally found this with $IFNDEF, so it seems to apply to both.)

On the other hand, the preceding declaration beginning $IF DEFINED does seem to generate the expected nodes. It may be that this is because they are entire typename = definition clauses, whereas the failing ones have IFDEF logic inserted in the middle?

vintagedave commented 3 months ago

By the way, one of the ancestor nodes here is also of type ERROR. I see this quite a bit parsing complex units (eg Winapi.Windows.pas.) It doesn't seem to affect things badly; I ignore ERROR nodes when using the nodes.

Isopod commented 2 months ago

I think the ERROR node for this particular snippet is because it’s missing and end.. But I can confirm that the node is genericDot. All things considered, this is not the worst possible outcome! The parser thinks this is a INotDefined.IInverse where someone has forgotten the dot. At least this is somewhat reasonable and localized.

Preprocessed language are impossible to parse using Tree-Sitter. Tree-Sitter is built on the theory of context-free grammars, and having a preprocessor is the opposite of context-free. It can’t work out of principle. I tried to make some common cases work, so that it at least doesn’t break the whole parse tree, but even then it is not always successful. Adding more special cases, in my experience, does more harm than good, as it tends to confuse the parser even more when it does encounter something it cannot parse, and it also bloats up the generated parser. See this comment in grammar.js: https://github.com/Isopod/tree-sitter-pascal/blob/a9ee969dec5b2e3b2ccccc5954fec04100c7619e/grammar.js#L79

If you need a parser that is 100% accurate, you’re probably best off using something else. The intended goal of Tree-Sitter is syntax highlighting.