decalage2 / oletools

oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.
http://www.decalage.info/python/oletools
Other
2.88k stars 561 forks source link

extract_macros (of VBA_PARSER) doesn't extract macrosheet code anymore #728

Open eyaltemps opened 2 years ago

eyaltemps commented 2 years ago

I'm not sure if this is a bug, or I'm missing a new feature or a specific action I should make, so i'll open it as a bug.

Affected tool: olevba

Bug description: 0.56.2 oletools version extracts macrosheets macro code by default when using "extract_macros()" but 0.60 oletools version doesn't.

File to reproduce the bug (password: Password1): food1.zip

How To Reproduce the bug: Python:

vbaparser = VBA_Parser(file_path)
for (filename, stream_path, vba_filename, vba_code) in vbaparser.extract_macros():
      print(vba_code)

CLI: Issue can be also seen with CLI command: olevba -jc {FILE_PATH}

Expected behavior Extracted macro code will contain macrosheet: "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n<xm:macrosheet.............." as happens in 0.56.2

Screenshots: 0 56olevba 0 60olevba

How can I make vbaparser.extract_macros() extract also the macrosheet code, as it worked in 0.56.2?

eyaltemps commented 2 years ago

Any advice to handle the issue with latest Olevba will be appreciated

decalage2 commented 2 years ago

Hi @eyaltemps, indeed in 0.60 I made important changes on how XLM macros are extracted. I integrated XLMMacroDeobfuscator because it gives much better results than plugin_biff (thanks to emulation), and it supports more file formats. I removed the old code that was extracting XLM macros from xlsm because it was just giving raw XML. This is why you see no macro in your case. If you install XLMMacroDeobfuscator it should fix your issue.

However, by default XLMMacroDeobfuscator is not installed by pip. You can either install it separately (see https://github.com/DissectMalware/XLMMacroDeobfuscator) or simply update oletools with this command:

pip install -U oletools[full]
decalage2 commented 2 years ago

I need to check if olevba could fall back to the old code for XLSM if XLMMacroDeobfuscator is not available.

eyaltemps commented 2 years ago

Hi @decalage2 ,

Thank you for your response. I tried to extract those macros with the new XLMMacroDeobfuscator, but I couldn't get any expected results. Can you share your advice on that?

I ran the following code:

from XLMMacroDeobfuscator.deobfuscator import process_file
result = process_file(file="C:\shared\food1.xlsm",
                          noninteractive=True,
                          noindent=True,
                          output_formula_format='[[CELL-ADDR]], [[INT-FORMULA]]',
                          return_deobfuscated=True,
                          timeout=30)
print("result is: ", result)

The printed results in the console are:

File: C:\shared\food1.xlsm

Unencrypted document or unsupported file format
Unencrypted xlsm file

[Loading Cells]
[Starting Deobfuscation]
[END of Deobfuscation]
time elapsed: 0.1169731616973877
**result is:  []**

I.e, I got an empty result.

The file is attached, and the expected behavior for me would be to be able to detect the XLM macro via python project (as I could detect using 0.56.2 olevba) . food1.zip (Password1)

EDIT: I will mention that although I used the command pip install -U oletools[full], and had the XLMMacroDeobfuscator, olevba still didn't extracted the XLM as it did in 0.56.2 (that is why I tried to use "process_file" that I presented above): Used the following code:

vbaparser = VBA_Parser(file_path)
for (filename, stream_path, vba_filename, vba_code) in vbaparser.extract_macros():
      print(vba_code)

Thanks!

decalage2 commented 2 years ago

The issue is actually due to a Unicode error when running XLMMacroDeobfuscator:

xlmdeobfuscator -c food1.xlsm
XLMMacroDeobfuscator: defusedxml is not installed (required to securely parse XLSM files)

XLMMacroDeobfuscator(v0.2.5) - https://github.com/DissectMalware/XLMMacroDeobfuscator

Traceback (most recent call last):
  File "c:\program files\python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\program files\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Program Files\Python39\Scripts\xlmdeobfuscator.exe\__main__.py", line 7, in <module>
  File "C:\Users\xxx\AppData\Roaming\Python\Python39\site-packages\XLMMacroDeobfuscator\deobfuscator.py", line 3125, in main
    defaults = json.load(config_file)
  File "c:\program files\python39\lib\json\__init__.py", line 293, in load
    return loads(fp.read(),
  File "c:\program files\python39\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 14: invalid continuation byte

I will ask @DissectMalware to check this sample.

eyaltemps commented 2 years ago

@decalage2 Thank you, Regardless the deobfuscation, is there any way to output the raw XML as it was in 0.56? If not with Olevba, would you advice any other tool?

decalage2 commented 2 years ago

I plan to reintegrate the old code that was extracting the raw xml as fallback, but I will need some time to do it. In the meantime, you can still use olevba 0.56 if it works better for you.

DissectMalware commented 2 years ago

The macrosheet seems to only have two formulas

image

image