decalage2 / oletools

oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.
http://www.decalage.info/python/oletools
Other
2.88k stars 564 forks source link

olevba+msodde: avoid triggering an exception for incorrect file formats #427

Open decalage2 opened 5 years ago

decalage2 commented 5 years ago

Raised by oppimaniac on Twitter: olevba and msodde now throw an exception if a non OLE file (like rtf) is being processed. Before, msodde just returned blank and olevba returned TXT: in the flags. I liked that old behavior more. (applies to v0.54, probably due to the new handling of encrypted files)

christian-intra2net commented 5 years ago

We should create a unittest for this. Will have a look

christian-intra2net commented 5 years ago

This really is partially due to encryption which raises errors before other parts of the code are reached. I have fixed this, included in my next PR.

However, in case of olevba & RTF, this has been there for a while. See PR #254

decalage2 commented 5 years ago

After merging #441, olevba still displays two warnings and raises an exception when opening a rtf:

WARNING  msoffcrypto failed to interpret file 72b14306c9f95536d03d88cf63204f70630dd9cd00664ad7f86c1d774c8508e9.rtf or determine whether it is encrypted: Unsupported file format
WARNING  Failed to check 72b14306c9f95536d03d88cf63204f70630dd9cd00664ad7f86c1d774c8508e9.rtf for encryption (not an OLE2 structured storage file); assume it is not encrypted.
[...]
FileOpenError: Failed to open file 72b14306c9f95536d03d88cf63204f70630dd9cd00664ad7f86c1d774c8508e9.rtf is RTF, need to run rtfobj.py and find VBA Macros in its output.

To address this issue, I think olevba should just detect this is an RTF and return that it contains no VBA macros, skipping the msoffcrypto test. A file starting with the RTF magic cannot be OLE or ZIP. However, we should also check that it is NOT a MHTML or XML file, as it could be used to skip detection (MHTML and XML do not have a mandatory magic at the beginning of the file). I'll think about the best way to do that.