DissectMalware / XLMMacroDeobfuscator

Extract and Deobfuscate XLM macros (a.k.a Excel 4.0 Macros)
Apache License 2.0
572 stars 115 forks source link

Error: Unexpected token('CMPOP', ='=) #80

Open coaleiii opened 3 years ago

coaleiii commented 3 years ago

Sample: https://app.any.run/tasks/03f85d8e-c349-48bc-b367-b7e6ab6b1f94/# Error message: Error [deobfuscator.py:2433 parse_tree = self.xlm_parser.parse(formula)]: Unexpected token Token('CMPOP', '=') at line 1, column 221. Expected one of:

Issue: A sample cell is =""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""=RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452)=FORMULA('Doc4'!$AT$3&'Doc4'!$AT$4&'Doc4'!$AT$5&'Doc4'!$AT$6&'Doc4'!$AT$7&'Doc4'!$AT$8,'Doc3'!$AQ$13)=RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452)=""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""

Of which, the padding can be identified as =""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""=RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452) and =RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452)=RAND()=SUMPRODUCT(54623,42,452,452,452)=""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""&""

A simple work around thus far would be to search & replace all the padding & the script works flawlessly again. Variant seen with the padding include =NOW()=NOW=NOW() as well, which breaks the scripts in the similar fashion.

DissectMalware commented 3 years ago

I investigated the sample.

The problem is that the xlm grammar can parse one statement at a time.

In this sample, some of the cells contain more that one statement (like the one you mentioned). The grammar fails to recognize that.

Example (3 statements in a cell) image

=FORMULA(10,B2)=FORMULA(11,B3)=FORMULA(12, B4)

DissectMalware commented 3 years ago

Fixed the grammar to handle multi formulas. It fixes the problem.

image

However, I should think about a way to properly show the changes to users

Currently, users see

image

as FORMULA in AZ112 set another cell which is read when Doc1BD97 is invoked

DissectMalware commented 3 years ago

The issue is fixed but still needs more testing before merging with the master branch

image

piffey commented 3 years ago

Just wanted to chime in and say that your branch with the fix has been working wonderfully for me on documents with this issue I've been discovering lately. Haven't encountered a problem once. Thanks for the work. Was trying to fix it myself when stumbled across the big. Appreciate it.

DissectMalware commented 3 years ago

Just wanted to chime in and say that your branch with the fix has been working wonderfully for me on documents with this issue I've been discovering lately. Haven't encountered a problem once. Thanks for the work. Was trying to fix it myself when stumbled across the big. Appreciate it.

Be advised this fix breaks several things as the grammar parser is not suitable to handle the changes that I made to the grammar. I need to change the grammar parser but it will slow down the process significantly... I'm still investigating...