DissectMalware / XLMMacroDeobfuscator

Extract and Deobfuscate XLM macros (a.k.a Excel 4.0 Macros)
Apache License 2.0
568 stars 116 forks source link

Unexpected Token Error #107

Closed baderj closed 2 years ago

baderj commented 2 years ago

The following sample (link on malware bazaar) raises an Unexpected token Token error in v0.2.5:

MD5 1083ce36f5b5ef2c95c05ab24090b22f
SHA1 cb439efcacd86109150cc00a55af770ab5e16446
SHA256 55b4ae264bbd69339965edee9c86a6fb869f3b6f6f693b33f41820f00f30ecd1
Size 238K

This could be related to #101, but since that particular issue was fixed in v0.2.4, I'm opening a separate issue here.

xlmdeobfuscator  --file 1083ce36f5b5ef2c95c05ab24090b22f.bin
XLMMacroDeobfuscator: pywin32 is not installed (only is required if you want to use MS Excel)

          _        _______
|\     /|( \      (       )
( \   / )| (      | () () |
 \ (_) / | |      | || || |
  ) _ (  | |      | |(_)| |
 / ( ) \ | |      | |   | |
( /   \ )| (____/\| )   ( |
|/     \|(_______/|/     \|
   ______   _______  _______  ______   _______           _______  _______  _______ _________ _______  _______
  (  __  \ (  ____ \(  ___  )(  ___ \ (  ____ \|\     /|(  ____ \(  ____ \(  ___  )\__   __/(  ___  )(  ____ )
  | (  \  )| (    \/| (   ) || (   ) )| (    \/| )   ( || (    \/| (    \/| (   ) |   ) (   | (   ) || (    )|
  | |   ) || (__    | |   | || (__/ / | (__    | |   | || (_____ | |      | (___) |   | |   | |   | || (____)|
  | |   | ||  __)   | |   | ||  __ (  |  __)   | |   | |(_____  )| |      |  ___  |   | |   | |   | ||     __)
  | |   ) || (      | |   | || (  \ \ | (      | |   | |      ) || |      | (   ) |   | |   | |   | || (\ (
  | (__/  )| (____/\| (___) || )___) )| )      | (___) |/\____) || (____/\| )   ( |   | |   | (___) || ) \ \__
  (______/ (_______/(_______)|/ \___/ |/       (_______)\_______)(_______/|/     \|   )_(   (_______)|/   \__/

XLMMacroDeobfuscator(v0.2.5) - https://github.com/DissectMalware/XLMMacroDeobfuscator

File: /home/user/samples/1083ce36f5b5ef2c95c05ab24090b22f.bin

Unencrypted document or unsupported file format
Unencrypted xlsb file

[Loading Cells]
auto_open: auto_open->LEKJF!$E$1
[Starting Deobfuscation]
Error [deobfuscator.py:2586 parse_tree = self.xlm_parser.parse(formula)]: Unexpected token Token('__ANON_0', '!E14, Fe2!I4)=FORMULA()=FORMULA(Fe2!E17, Fe1!B4)=FORMULA(Vvfrbuk!L24&Vvfrbuk!L26&Vvfrbuk!L27&Vvfrbuk!L28&Vvfrbuk!L28&Fvfbor!C11&Fe1!B4&Fvfbor!F3&Fe1!B4&Fvfbor!H9&Fe1!B4&Fvfbor!D6&Fe1!B4&Fvfbor!B2&Vvfrbuk!L26&Vvfrbuk!H24&Fvfbor!L4&Vvfrbuk!F24&Fvfbor!O7, E10)=FORMULA(Vvfrbuk!L24&Vvfrbuk!L26&Vvfrbuk!L27&Vvfrbuk!L28&Vvfrbuk!L28&Fvfbor!I13&Fe1!B4&Fvfbor!K12&Vvfrbuk!M16&Vvfrbuk!Q11&Vvfrbuk!R17&Vvfrbuk!I3&Vvfrbuk!B11&Vvfrbuk!E2&Vvfrbuk!R17&Vvfrbuk!T9&Vvfrbuk!M8&Vvfrbuk!T4&Vvfrbuk!R17&Fvfbor!Q10&Fe2!I4&Vvfrbuk!S2&Fvfbor!J5&Fvfbor!P1&Fvfbor!F20&Vvfrbuk!L26&Vvfrbuk!H24&Fvfbor!L4&Vvfrbuk!H26&Fvfbor!S5&Fvfbor!T14, E12)=FORMULA(Vvfrbuk!L24&Vvfrbuk!G8&Vvfrbuk!F4&Vvfrbuk!G8&Vvfrbuk!O3&Vvfrbuk!L30&Vvfrbuk!F24&Fe1!B4&Fe2!I4&Vvfrbuk!C16&Vvfrbuk!O18&Vvfrbuk!B3&Fe1!B4&Vvfrbuk!Q1&Vvfrbuk!S5&Vvfrbuk!F28&Vvfrbuk!L26&Vvfrbuk!H24&Fvfbor!L4&Vvfrbuk!H26&Fvfbor!S5&Vvfrbuk!L31, E14)=FORMULA(Vvfrbuk!L24&Vvfrbuk!L26&Vvfrbuk!L27&Vvfrbuk!L28&Vvfrbuk!L28&Fvfbor!I13&Fe1!B4&Fvfbor!K12&Vvfrbuk!M16&Vvfrbuk!Q11&Vvfrbuk!R17&Vvfrbuk!I3&Vvfrbuk!B11&Vvfrbuk!E2&Vvfrbuk!R17&Vvfrbuk!T9&Vvfrbuk!M8&Vvfrbuk!T4&Vvfrbuk!R17&Fvfbor!Q10&Fe2!I4&Vvfrbuk!S2&Fvfbor!J5&Fvfbor!P1&Fvfbor!G22&Vvfrbuk!L26&Vvfrbuk!H24&Fvfbor!L4&Vvfrbuk!H26&Fvfbor!E8&Fvfbor!T14, E16)=FORMULA(Vvfrbuk!L24&Vvfrbuk!G8&Vvfrbuk!F4&Vvfrbuk!G8&Vvfrbuk!O3&Vvfrbuk!L30&Vvfrbuk!F24&Fe1!B4&Fe2!I4&Vvfrbuk!C16&Vvfrbuk!O18&Vvfrbuk!B3&Fe1!B4&Vvfrbuk!Q1&Vvfrbuk!S5&Vvfrbuk!F28&Vvfrbuk!L26&Vvfrbuk!H24&Fvfbor!L4&Vvfrbuk!H26&Fvfbor!E8&Vvfrbuk!L31, E18)=FORMULA(Vvfrbuk!L24&Vvfrbuk!L26&Vvfrbuk!L27&Vvfrbuk!L28&Vvfrbuk!L28&Fvfbor!I13&Fe1!B4&Fvfbor!K12&Vvfrbuk!M16&Vvfrbuk!Q11&Vvfrbuk!R17&Vvfrbuk!I3&Vvfrbuk!B11&Vvfrbuk!E2&Vvfrbuk!R17&Vvfrbuk!T9&Vvfrbuk!M8&Vvfrbuk!T4&Vvfrbuk!R17&Fvfbor!Q10&Fe2!I4&Vvfrbuk!S2&Fvfbor!J5&Fvfbor!P1&Fvfbor!H20&Vvfrbuk!L26&Vvfrbuk!H24&Fvfbor!L4&Vvfrbuk!H26&Fvfbor!R16&Fvfbor!T14, E20)=FORMULA(Vvfrbuk!L24&Vvfrbuk!G8&Vvfrbuk!F4&Vvfrbuk!G8&Vvfrbuk!O3&Vvfrbuk!L30&Vvfrbuk!F24&Fe1!B4&Fe2!I4&Vvfrbuk!C16&Vvfrbuk!O18&Vvfrbuk!B3&Fe1!B4&Vvfrbuk!Q1&Vvfrbuk!S5&Vvfrbuk!F28&Vvfrbuk!L26&Vvfrbuk!H24&Fvfbor!L4&Vvfrbuk!H26&Fvfbor!R16&Vvfrbuk!L31, E22)=FORMULA(Vvfrbuk!L24&Vvfrbuk!E36&Vvfrbuk!C38&Vvfrbuk!C32&Vvfrbuk!F31&Vvfrbuk!E36&Vvfrbuk!E42&Vvfrbuk!L30&Vvfrbuk!L31, E30)') at line 1, column 23.
Expected one of: 
    * COLON
    * CONCATOP
    * R_PRA
    * MULTIOP
    * ADDITIVEOP
    * CMPOP
    * L_PRA
    * LIST_SEPARATOR
Previous tokens: [Token('__ANON_2', 'Fe1')]

Files:

[END of Deobfuscation]
time elapsed: 0.1354522705078125
baderj commented 2 years ago

I think I figured out the problem. Here is a minimal example that fails:

=FORMULA(Fe1!E14, Fe2!I4)

Parsing this fails with

=FORMULA(Fe1!E14, Fe2!I4)
            ^
Expected one of: 
        * COLON
        * L_PRA
        * R_PRA
        * LIST_SEPARATOR
        * CMPOP
        * ADDITIVEOP
        * CONCATOP
        * MULTIOP

Previous tokens: Token('__ANON_2', 'Fe1')

I suspect it is because in the reference, the sheet name (Fe1) is also a valid cell name. Replacing the sheet name in the formula, e.g., formula = "=FORMULA(Sheet1!E14, Sheet2!I4)" does no longer trip up Lark.

DissectMalware commented 2 years ago

Fix this by wrapping sheet name in pyxlsb v0.0.9

https://github.com/DissectMalware/pyxlsb2/commit/0a1ff1be329aa282ecbc347ff44fc6c07351685b

and also it needs another fix in xlmdeobfuscator

update the pyxlsb from pypi and xlmdeobfuscator from repo

image