DissectMalware / XLMMacroDeobfuscator

Extract and Deobfuscate XLM macros (a.k.a Excel 4.0 Macros)
Apache License 2.0
570 stars 115 forks source link

<> Token Not Parsed #8

Closed michaelweber closed 4 years ago

michaelweber commented 4 years ago

Example crash file @ not-equals-parser-bug.xls.zip

When a document uses a macro containing the not-equals token, it is not currently parsed correctly and will crash. For example, in a sheet that eventually creates the expression:

=WHILE(ACTIVE.CELL()<>"END")

The following crash will occur:

Traceback (most recent call last):
  File "path\to\python\python37\lib\site-packages\lark\lexer.py", line 376, in lex
    for x in l.lex(stream, self.root_lexer.newline_types, self.root_lexer.ignore_types):
  File "path\to\python\python37\lib\site-packages\lark\lexer.py", line 182, in lex
    raise UnexpectedCharacters(stream, line_ctr.char_pos, line_ctr.line, line_ctr.column, allowed=allowed, state=self.state, token_history=last_token and [last_token])
lark.exceptions.UnexpectedCharacters: No terminal defined for '>' at line 1 col 22

=WHILE(ACTIVE.CELL()<>"END")
                     ^

Expecting: {'NAME', '__ANON_0', 'L_PRA', 'BOOLEAN', 'STRING', 'NUMBER'}

Previous tokens: Token(CMPOP, '<')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "path\to\python\python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "path\to\python\python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "path\to\Python\Python37\Scripts\xlmdeobfuscator.exe\__main__.py", line 9, in <module>
  File "path\to\python\python37\lib\site-packages\XLMMacroDeobfuscator\deobfuscator.py", line 552, in main
    for step in interpreter.deobfuscate_macro(not args[0].noninteractive):
  File "path\to\python\python37\lib\site-packages\XLMMacroDeobfuscator\deobfuscator.py", line 425, in deobfuscate_macro
    parse_tree = self.xlm_parser.parse(current_cell.formula)
  File "path\to\python\python37\lib\site-packages\lark\lark.py", line 333, in parse
    return self.parser.parse(text, start=start)
  File "path\to\python\python37\lib\site-packages\lark\parser_frontends.py", line 125, in parse
    return self._parse(token_stream, start, set_parser_state)
  File "path\to\python\python37\lib\site-packages\lark\parser_frontends.py", line 54, in _parse
    return self.parser.parse(input, start, *args)
  File "path\to\python\python37\lib\site-packages\lark\parsers\lalr_parser.py", line 35, in parse
    return self.parser.parse(*args)
  File "path\to\python\python37\lib\site-packages\lark\parsers\lalr_parser.py", line 83, in parse
    for token in stream:
  File "path\to\python\python37\lib\site-packages\lark\lexer.py", line 391, in lex
    raise UnexpectedToken(t, e.allowed, state=e.state)
lark.exceptions.UnexpectedToken: Unexpected token Token(CMPOP, '>') at line 1, column 22.
Expected one of:
        * NAME
        * __ANON_0
        * L_PRA
        * BOOLEAN
        * STRING
        * NUMBER
DissectMalware commented 4 years ago

The issue is fixed. Please check the latest commit. By the way, it seems the provided xls file doesn't contain <>. I tested the fix in interactive shell.

michaelweber commented 4 years ago

It's contained after it goes through a few stages - it gets generated by the first round of FORMULA expressions - I should have just submitted a simplified sheet that just had the <> expression in it. My mistake!