decalage2 / oletools

oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.
http://www.decalage.info/python/oletools
Other
2.92k stars 563 forks source link

rtfobj - unexpected result when parsing obfuscated documents (CVE-2017-0199) #198

Open whisperzzzz opened 7 years ago

whisperzzzz commented 7 years ago

Documents can be downloaded here: https://www.dropbox.com/sh/xaxxlw7casedms3/AADY_8f9Pk4xv1_bTCpmEhU3a?dl=0

I generated this rtf file from the script here(https://github.com/bhdresh/CVE-2017-0199)

rtfobj can handle "CVE-2017-0199-removed-ignorable-destinations.rtf" correctly


It's about how to parse {\*\f133fee815154c0c76922489cc3a83}.

Is "\*" processed correctly here? I'm quite confused about it(RTF Spec 1.9.1 Page 10 ~ Page 11).

whisperzzzz commented 7 years ago

Besides, rtfobj may not process this case correctly.

Obfuscated -> {\object\objocx\objdata 341\’112345 } Clear -> {\object\objocx\objdata 342345}

It's described here and there are some another cases as well.(https://www.fireeye.com/blog/threat-research/2016/05/how_rtf_malware_evad.html)

samiraguiar commented 7 years ago

From what I've understood, when using newer control words and needing to keep backwards compatibility the format specifies that the control word should be preceded by \*, e.g.: {\*\somenewcontrolword...}. Readers should then check if they know that control word; if they do, they should do something with it otherwise they should just skip everything until the next closing braces.

From what I've seen, rtfobj identifies \* as a control symbol and just goes on to the next characters. In your example, the next character is \, so it tries to parse it as a control word. Since it doesn't know that word, it ignores it and keeps parsing.

Maybe what needs to be changed here is that once it identified that an unknown control word was preceded by \* it should ignore everything until the next closing brace, and not keep on reading. What do you think, @decalage2?

samiraguiar commented 7 years ago

For the second example I've found it to be similar to the first. rtfobj identifies \' as a control symbol, 341 and 11235 as text. However, it should also signal \'hh inside object control groups as a possible obfuscation.

decalage2 commented 6 years ago

I made some improvements to rtfobj, which might handle the issues you described. But I cannot access your sample file anymore. Could you please share it again?