antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.16k stars 3.7k forks source link

VB6 visitor issue #2290

Closed lsalamon closed 3 years ago

lsalamon commented 3 years ago

This simple VB code generates wrong visitor events for Dim statement:

Public Function VarLongDim() As Long
Dim number As Long
number = 1234
VarLongDim = number
End Function

ExitBlock and ExitEveryRule occur long after the block is evaluated.

What could be the cause of this behavior?

kaby76 commented 3 years ago

I wrote a listener class (C#) that outputs when the EnterBlock, ExitBlock, EnterVariableStmt, ExitVariableStmt, and ExitEveryRule methods are called. The code ran exactly as expected (see the output below). There is only one block parse tree node, and only one variableStmt parse tree node (which is a descendant of the block node).

Parse tree:

( startRule
  ( module
    ( moduleBody
      ( moduleBodyElement
    ( functionStmt
      ( visibility
        ( PUBLIC i=0 txt=Public tt=129 DEFAULT_TOKEN_CHANNEL
      ) ) 
      ( WS i=6 txt=  tt=222 DEFAULT_TOKEN_CHANNEL
      ) 
      ( FUNCTION i=7 txt=Function tt=70 DEFAULT_TOKEN_CHANNEL
      ) 
      ( WS i=15 txt=  tt=222 DEFAULT_TOKEN_CHANNEL
      ) 
      ( ambiguousIdentifier
        ( IDENTIFIER i=16 txt=VarLongDim tt=218 DEFAULT_TOKEN_CHANNEL
      ) ) 
      ( argList
        ( LPAREN i=26 txt=( tt=194 DEFAULT_TOKEN_CHANNEL
        ) 
        ( RPAREN i=27 txt=) tt=205 DEFAULT_TOKEN_CHANNEL
      ) ) 
      ( WS i=28 txt=  tt=222 DEFAULT_TOKEN_CHANNEL
      ) 
      ( asTypeClause
        ( AS i=29 txt=As tt=8 DEFAULT_TOKEN_CHANNEL
        ) 
        ( WS i=31 txt=  tt=222 DEFAULT_TOKEN_CHANNEL
        ) 
        ( type_
          ( baseType
        ( LONG i=32 txt=Long tt=85 DEFAULT_TOKEN_CHANNEL
      ) ) ) ) 
      ( NEWLINE i=36 txt=\r\n tt=220 DEFAULT_TOKEN_CHANNEL
      ) 
      ( block
        ( blockStmt
          ( variableStmt
        ( DIM i=38 txt=Dim tt=40 DEFAULT_TOKEN_CHANNEL
        ) 
        ( WS i=41 txt=  tt=222 DEFAULT_TOKEN_CHANNEL
        ) 
        ( variableListStmt
          ( variableSubStmt
            ( ambiguousIdentifier
              ( IDENTIFIER i=42 txt=number tt=218 DEFAULT_TOKEN_CHANNEL
            ) ) 
            ( WS i=48 txt=  tt=222 DEFAULT_TOKEN_CHANNEL
            ) 
            ( asTypeClause
              ( AS i=49 txt=As tt=8 DEFAULT_TOKEN_CHANNEL
              ) 
              ( WS i=51 txt=  tt=222 DEFAULT_TOKEN_CHANNEL
              ) 
              ( type_
            ( baseType
              ( LONG i=52 txt=Long tt=85 DEFAULT_TOKEN_CHANNEL
        ) ) ) ) ) ) ) ) 
        ( NEWLINE i=56 txt=\r\n tt=220 DEFAULT_TOKEN_CHANNEL
        ) 
        ( blockStmt
          ( letStmt
        ( implicitCallStmt_InStmt
          ( iCS_S_VariableOrProcedureCall
            ( ambiguousIdentifier
              ( IDENTIFIER i=58 txt=number tt=218 DEFAULT_TOKEN_CHANNEL
        ) ) ) ) 
        ( WS i=64 txt=  tt=222 DEFAULT_TOKEN_CHANNEL
        ) 
        ( EQ i=65 txt== tt=187 DEFAULT_TOKEN_CHANNEL
        ) 
        ( WS i=66 txt=  tt=222 DEFAULT_TOKEN_CHANNEL
        ) 
        ( valueStmt
          ( literal
            ( INTEGERLITERAL i=67 txt=1234 tt=212 DEFAULT_TOKEN_CHANNEL
        ) ) ) ) ) 
        ( NEWLINE i=71 txt=\r\n tt=220 DEFAULT_TOKEN_CHANNEL
        ) 
        ( blockStmt
          ( letStmt
        ( implicitCallStmt_InStmt
          ( iCS_S_VariableOrProcedureCall
            ( ambiguousIdentifier
              ( IDENTIFIER i=73 txt=VarLongDim tt=218 DEFAULT_TOKEN_CHANNEL
        ) ) ) ) 
        ( WS i=83 txt=  tt=222 DEFAULT_TOKEN_CHANNEL
        ) 
        ( EQ i=84 txt== tt=187 DEFAULT_TOKEN_CHANNEL
        ) 
        ( WS i=85 txt=  tt=222 DEFAULT_TOKEN_CHANNEL
        ) 
        ( valueStmt
          ( implicitCallStmt_InStmt
            ( iCS_S_VariableOrProcedureCall
              ( ambiguousIdentifier
            ( IDENTIFIER i=86 txt=number tt=218 DEFAULT_TOKEN_CHANNEL
      ) ) ) ) ) ) ) ) 
      ( NEWLINE i=92 txt=\r\n tt=220 DEFAULT_TOKEN_CHANNEL
      ) 
      ( END_FUNCTION i=94 txt=End Function tt=47 DEFAULT_TOKEN_CHANNEL
    ) ) ) ) 
    ( NEWLINE i=106 txt=\r\n tt=220 DEFAULT_TOKEN_CHANNEL
  ) ) 
  ( EOF i=108 txt= tt=-1 DEFAULT_TOKEN_CHANNEL
) ) 

Code:

public class Foo : VisualBasic6ParserBaseListener
{
    public override void EnterBlock([NotNull] VisualBasic6Parser.BlockContext context)
    {
    System.Console.WriteLine("EnterBlock");
    }
    public override void ExitBlock([NotNull] VisualBasic6Parser.BlockContext context)
    {
    System.Console.WriteLine("ExitBlock");
    }
    public override void EnterVariableStmt([NotNull] VisualBasic6Parser.VariableStmtContext context)
    {
    System.Console.WriteLine("EnterVariableStmt");
    }
    public override void ExitVariableStmt([NotNull] VisualBasic6Parser.VariableStmtContext context)
    {
    System.Console.WriteLine("ExitVariableStmt");
    }
    public override void ExitEveryRule([NotNull] ParserRuleContext context)
    {
    System.Console.WriteLine("ExitEveryRule");
    }
}

Output:

ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
EnterBlock
EnterVariableStmt
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitVariableStmt
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitBlock
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
ExitEveryRule
lsalamon commented 3 years ago

I did a poc to reproduce the case in csharp. Project link: https://tmpfiles.org/97428/vb6parsetest.zip

                                                        ExitEveryRule*(4)VarLongDim
                                                    ExitBlockStmt*(4)VarLongDim
                                                ExitEveryRule*(4)VarLongDim
                                            ExitBlock*(2)Dim
                                        ExitEveryRule*(2)Dim
                                        VisitTerminal*--->End Function
                                    ExitFunctionStmt*(1)Public
                                ExitEveryRule*(1)Public
                            ExitModuleBodyElement*(1)Public
                        ExitEveryRule*(1)Public
                    ExitModuleBody*(1)Public
                ExitEveryRule*(1)Public
            ExitModule*(1)Public
        ExitEveryRule*(1)Public
        VisitTerminal*---><EOF>
    ExitStartRule*(1)Public
ExitEveryRule*(1)Public
kaby76 commented 3 years ago

I am using the "Official Antlr4" tool and runtime (source runtime tool). The code you are using is Harwell's fork (source tool runtime). Note, Harwell's code hasn't been updated in several years, and when forked from https://github.com/antlr/antlr4 was already a year or two behind "Official Antlr4". The current version for official Antlr4 is 4.9.2.

But, the official version of Antlr4 and Harwell's version do not generate the same set of listener methods! For Official Antlr4, the tool does not generate methods for "labeled alts": EnterModuleOption, ExitModuleOption, EnterIfThenElseStmt, ExitIfThenElseStmt, EnterSC_Cond, ExitSC_Cond, EnterSC_CondExpr, ExitSC_CondExpr, EnterValueStmt, ExitValueStmt. Labeled alts are those alts in the grammar that are marked with '#' and label. I found this out when I tried to port your code to Official Antlr4, and then got compiler errors indicating I'm trying to override methods that don't exist in the listener base class. Personally, I never use them because they really are not necessary, cloud an otherwise pristine grammar with stuff that is unportable to other parser generators.

I don't know why there is a difference in the set of listeners between Official Antlr4 and Harwell's fork. But, it does seem that there might be some ambiguity in which listener method is called when you have both a rule and its labeled alt in Harwell's code. Perhaps this is why, for Official Antlr4, you don't see the normal rule listeners if the alts of the rule are labeled.

In any case, Official Antlr4 looks fine. Could you please explain what you expect, where the error is, what listener method is called out of order?

lsalamon commented 3 years ago

Thanks for the clarification. I will update the project using the latest version. Thanks.

lsalamon commented 3 years ago

Just for the record. I set up a project using both versions both using the same Antlr4 grammar and both produce the same results. We seem to have a grammar problem when used in the CSharp language. But just ignoring some types of events is enough to solve the case.

kaby76 commented 3 years ago

Goodness. If you are still claiming that the listener methods are performed out of order, then let's try to identify this problem rather than brush it under the rug, and just close off the issue. This isn't how progress is made.

The only really good way to discuss this and to prove that there is a problem is to use DFS numberings (see Wiki).

Attached is a zip file xxx.zip with two C#-target programs, one in Antlr4cs and the other in Official Antlr4, that computes the DFS number of the parse tree node. In the output, the number of the node, e.g., 2 as in the tree node moduleBody 2 is the DFS number. For an Enter, Exit, or VisitTerminal call, the node is identified with the DFS number. This way we can synch up the node in the tree with the actual call. In addition, with DFS numbering in hand, we can insert assertions to check the validity of the tree traversal.

Each program first parses, then computes the DFS numbering, then runs the tree walker. With DFS numbering, we can now write assertions within the Listener that identify out-of-order conditions.

For the input, there were no assertions triggered. The output from the program was identical between the two C# programs.

With the output in hand, I carefully compared each and every Enter, Exit, VisitTerminal method call with each node in the parse tree. It appears exactly as what I would expect. I also compared it with the program you supplied, but have removed. There is no difference, except for the output from VisitTerminal where your code avoids outputing when the leaf node is null or white space.