Open ckrueger1979 opened 11 months ago
Could you clarify the following things:
I would expect that the ANTLR parser shouldn't output illegal VBA code
I would expect that the ANTLR parser shouldn't output illegal VBA code
Please be precise. Antlr does not "output illegal VBA code." The job of Antlr is to parse input (valid or not), output error messages, and return a parse tree.
The place to add this check would be to override the Emit()
method of the base class for the lexer. The method could check the start and stop indices of the token, call Lexer.Emit()
, and report the error. We already do something like this in other grammars, e.g., lua. It's an easy fix. However, the change will mean the grammar must be split, and target-specific code added for each target.
My compiler construction lecture was 20 years ago, sorry that I've mixed something up.
I thought the parser should be able to parse and emit only valid language and otherwise create an error.
PS: What do you mean with target specific code? Specific to VBA?
I thought the parser should be able to parse and emit only valid language and otherwise create an error.
Parsers do not emit code! A parser is a function with the signature boolean parse(string input)
--it takes a string and outputs true if the string is valid in the language described by the grammar.
So,
parse("Public Sub Module()
Dim sd As Boolean
End Sub")
returns true
. It does not output VBA code.
What do you mean with target specific code? Specific to VBA?
Antlr generates a parser for the VBA grammar in a programming language that you compile and link into a program. The current targets are CSharp (C#), Cpp (C++), Dart2 (Dart), Go, Java, JavaScript, PHP, Python3, and TypeScript. If you don't tell the parser generator what target you want, it will output a parser in Java.
The generated parser code can reference other code that you write to support the parser. That support code has to be in the target programming language. If you generate the parser for C#, you have to write the support code in C#. This is important because you cannot use grammars that require support code in the Antlr Intellij extension, or lab.antlr.org.
Thanks for the detailed explanation!
The parser for VBA will accept code with too long lines, correct? Return true even if the line is longer then 1023 chars
The parser for VBA will accept code with too long lines, correct? Return true even if the line is longer then 1023 chars
Yes, you are right. The parser for the VBA grammar accepts lines over 1023. I'll write a fix today or tomorrow.
Hi,
I think this grammar https://github.com/antlr/grammars-v4/blob/master/vba/vba.g4 has a problem with long lines.
This obfuscator https://github.com/oriolOrnaque/VBAObfuscator/ creates too long lines.
The length limit of a line is 1023 https://learn.microsoft.com/en-us/office/vba/language/reference/user-interface-help/line-too-long
I didn't find any reference to the line length in the grammar (see LINE_CONTINUATION and UNDERSCORE)
greetings Carsten