decalage2 / oletools

oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.
http://www.decalage.info/python/oletools
Other
2.93k stars 563 forks source link

oleobj: PPT triggers exceptions #784

Open decalage2 opened 2 years ago

decalage2 commented 2 years ago

When running oleobj on a PPT 97-2003 file (e.g. https://www.hybrid-analysis.com/sample/d1bceccf5d2b900a6b601c612346fdb3fa5bb0e2faeefcac3f9c29dc1d74838d/631b2c1d8501f5745e1ca88d), oleobj tries to parse it as an OpenXML file and triggers exceptions:

oleobj 0.60.1.dev5 - http://decalage.info/oletools
THIS IS WORK IN PROGRESS - Check updates regularly!
Please report any issue at https://github.com/decalage2/oletools/issues

-------------------------------------------------------------------------------
File: 'd1bceccf5d2b900a6b601c612346fdb3fa5bb0e2faeefcac3f9c29dc1d74838d.bin.sample'
Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\oletools\ooxml.py", line 503, in iter_files
    with zipper.open(subfile, 'r') as handle:
  File "c:\program files\python39\lib\zipfile.py", line 1523, in open
    raise BadZipFile("Bad magic number for file header")
zipfile.BadZipFile: Bad magic number for file header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files\python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\program files\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Program Files\Python39\Scripts\oleobj.exe\__main__.py", line 7, in <module>
  File "c:\program files\python39\lib\site-packages\oletools\oleobj.py", line 1044, in main
    process_file(filename, data, options.output_dir)
  File "c:\program files\python39\lib\site-packages\oletools\oleobj.py", line 873, in process_file
    for relationship, target in find_external_relationships(xml_parser):
  File "c:\program files\python39\lib\site-packages\oletools\oleobj.py", line 809, in find_external_relationships
    for _, elem, _ in xml_parser.iter_xml(None, False, OOXML_RELATIONSHIP_TAG):
  File "c:\program files\python39\lib\site-packages\oletools\ooxml.py", line 551, in iter_xml
    for subfile, handle in self.iter_files(subfiles):
  File "c:\program files\python39\lib\site-packages\oletools\ooxml.py", line 513, in iter_files
    raise BadOOXML(self.filename, 'not in zip format')
oletools.ooxml.BadOOXML: d1bceccf5d2b900a6b601c612346fdb3fa5bb0e2faeefcac3f9c29dc1d74838d.bin.sample is not an Office XML file: not in zip format
christian-intra2net commented 2 years ago

olevba (i.e. ppt_parser) also does not handle this file correctly, the structure of the streams is not as expected:

File appears not to be a ppt file (In stream "root" for field "listdir" found value "[['\x05DocumentSummaryInformation'], ['\x05SummaryInformation'], ['Current User'], ['MsoDataStore', 'HFOOÚAINÛÔ0AIÉÊÂCÂÎJKÐ==', 'Item'], ['MsoDataStore', 'HFOOÚAINÛÔ0AIÉÊÂCÂÎJKÐ==', 'Properties'], ['Pictures'], ['PowerPoint Document']]" but expected len = 1!)

However, replacing that error with a warning does not lead to detection of the payload. Will investigate also, just have to finish something else, first

christian-intra2net commented 2 years ago

work in progress: https://github.com/christian-intra2net/oletools/tree/detect-interactive-ppt-features

christian-intra2net commented 2 years ago

Found the problem: zipfile.is_zipfile returns True for this sample, although it clearly is not a zip file. Fixed it in the above branch (which also detects the actual malware content inside the sample)

christian-intra2net commented 2 years ago

Fix in #786