Open Xenios91 opened 9 months ago
Can you provide examples? Is it possible they do not parse correctly because they are, in fact, not proper syntax?
The MS-VBAL specification states that the module body must occur after the module header.
Can you provide examples? Is it possible they do not parse correctly because they are, in fact, not proper syntax?
The MS-VBAL specification states that the module body must occur after the module header.
I agree they are improper, however, I have come across documents in the wild that have this done. I am wondering if they still run and are like this due to document generation libraries? Either way its weird.
Let me run some of these through a sandbox again to confirm they are indeed running properly.
Excuse my ignorance, I am new to grammar Lexer/parser stuff, but why would you want improper documents to parse correctly. I thought the “proper” way was to create error handlers which intercept the parsing errors, and correct them to create valid files. If Apache POI is creating invalid files, isn’t that a problem that should be fixed on their end?
I guess it depends on what you want, from my perspective im interesting in using this tool to parse malware, so when a document is inproper but runs it still interest me.
If this is a democracy, my vote would be for the general-use grammer here to be the best reflection of any official standard, with enough extras to make it easiest to use, but I don’t know what the typical philosophy is.
The existing ANTLR grammar is missing many of the keywords that appear in the header it’s possible that VBA files with variables name things like VB_Base in a position that it shouldn’t be, will parse with ANTLR when it shouldn’t according to MS-VBAL
However, that document could have bugs. In fact, I found the document incorrectly defines the line-continuation lexical string. I notified Microsoft and they said it should be fixed in the next publication.
Thought you’d be interested…since VBA is now at version 7.1, and this grammar is supposedly 6.0, I’ve started from scratch rewriting the grammar from the published Microsoft Spec. My plan is to add a lot more tests to get the coverage up as high as possible.
https://github.com/Beakerboy/grammars-v4/tree/patch-7/vba
feel free to help out or provide feedback.
Also, in testing my new grammar, I’ve realized that, since VBA has a whole pre-compiler feature, it’s totally possible to have valid VBA files which are unable to be parsed by a single antlr4 grammar
#If Win16 Then
Public function foo()
#Else
Public Function foo()
#Endif
End Function
this is valid, but without pre-compiling any reasonable grammar will fail.
Do you have a precompiler coded up? I created a VBA precompiler grammar that could probably be leveraged with some visitors to make one. https://github.com/Beakerboy/grammars-v4/blob/coverage/vba/vba_cc/vba_cc.g4
My precompiler is pretty much done: https://github.com/Beakerboy/VBA-Precompiler/tree/dev
It’s possible that @Xenios91 is referring to VB_[Var]Description https://vbaplanet.com/attributes.php
I have submitted a request to Microsoft to clarify this in the specification document
When a document has attributes mixed with variables, it fails to parse correctly, as it believes module level declarations should occur after all attribute declarations. This seems to be not uncommon in documents ive scrapped off the web, resulting in parsing errors.