antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.08k stars 3.68k forks source link

VBA incorrectly parses attributes and variables #3935

Open Xenios91 opened 7 months ago

Xenios91 commented 7 months ago

When a document has attributes mixed with variables, it fails to parse correctly, as it believes module level declarations should occur after all attribute declarations. This seems to be not uncommon in documents ive scrapped off the web, resulting in parsing errors.

Beakerboy commented 7 months ago

Can you provide examples? Is it possible they do not parse correctly because they are, in fact, not proper syntax?

https://learn.microsoft.com/en-us/openspecs/microsoft_general_purpose_programming_languages/ms-vbal/d5418146-0bd2-45eb-9c7a-fd9502722c74

The MS-VBAL specification states that the module body must occur after the module header.

Xenios91 commented 7 months ago

Can you provide examples? Is it possible they do not parse correctly because they are, in fact, not proper syntax?

https://learn.microsoft.com/en-us/openspecs/microsoft_general_purpose_programming_languages/ms-vbal/d5418146-0bd2-45eb-9c7a-fd9502722c74

The MS-VBAL specification states that the module body must occur after the module header.

I agree they are improper, however, I have come across documents in the wild that have this done. I am wondering if they still run and are like this due to document generation libraries? Either way its weird.

Xenios91 commented 7 months ago

Let me run some of these through a sandbox again to confirm they are indeed running properly.

Beakerboy commented 7 months ago

Excuse my ignorance, I am new to grammar Lexer/parser stuff, but why would you want improper documents to parse correctly. I thought the “proper” way was to create error handlers which intercept the parsing errors, and correct them to create valid files. If Apache POI is creating invalid files, isn’t that a problem that should be fixed on their end?

Xenios91 commented 7 months ago

I guess it depends on what you want, from my perspective im interesting in using this tool to parse malware, so when a document is inproper but runs it still interest me.

Beakerboy commented 7 months ago

If this is a democracy, my vote would be for the general-use grammer here to be the best reflection of any official standard, with enough extras to make it easiest to use, but I don’t know what the typical philosophy is.

Beakerboy commented 7 months ago

The existing ANTLR grammar is missing many of the keywords that appear in the header it’s possible that VBA files with variables name things like VB_Base in a position that it shouldn’t be, will parse with ANTLR when it shouldn’t according to MS-VBAL

However, that document could have bugs. In fact, I found the document incorrectly defines the line-continuation lexical string. I notified Microsoft and they said it should be fixed in the next publication.

Beakerboy commented 7 months ago

Thought you’d be interested…since VBA is now at version 7.1, and this grammar is supposedly 6.0, I’ve started from scratch rewriting the grammar from the published Microsoft Spec. My plan is to add a lot more tests to get the coverage up as high as possible.

https://github.com/Beakerboy/grammars-v4/tree/patch-7/vba

feel free to help out or provide feedback.

Beakerboy commented 7 months ago

Also, in testing my new grammar, I’ve realized that, since VBA has a whole pre-compiler feature, it’s totally possible to have valid VBA files which are unable to be parsed by a single antlr4 grammar

#If Win16 Then
    Public function foo()
#Else
   Public Function foo()
#Endif
   End Function

this is valid, but without pre-compiling any reasonable grammar will fail.

Do you have a precompiler coded up? I created a VBA precompiler grammar that could probably be leveraged with some visitors to make one. https://github.com/Beakerboy/grammars-v4/blob/coverage/vba/vba_cc/vba_cc.g4

Beakerboy commented 6 months ago

My precompiler is pretty much done: https://github.com/Beakerboy/VBA-Precompiler/tree/dev

Beakerboy commented 6 months ago

It’s possible that @Xenios91 is referring to VB_[Var]Description https://vbaplanet.com/attributes.php

I have submitted a request to Microsoft to clarify this in the specification document