decalage2 / oletools

oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.
http://www.decalage.info/python/oletools
Other
2.81k stars 560 forks source link

Comments are causing false positives #817

Open DecimalTurn opened 1 year ago

DecimalTurn commented 1 year ago

This issue was already mentioned in https://github.com/decalage2/oletools/issues/90, but I think the problem deserves a specific issue.

Currently, for matching suspicious keywords, there is no attempt to distinguish a regular line of code from a comment:

eg.: https://github.com/decalage2/oletools/blob/168a92d7c53d972f499356bda7d3335c61710eec/oletools/olevba.py#L2201

I think there is a few ways we could avoid false positives related to comments, one of them would be to edit the pattern to look like this: r'(?i)^(?:[^']|\b).*\b' + re.escape(keyword) + r'\b'

The key here is that ^(?:[^']|\b).* will not match if the line starts with an apostrophe ('). The |\b is necessary otherwise the pattern would not match if the keyword was at the start of the line: https://regex101.com/r/CUI2V3/1

Alternatively, an other option to solve the issue would be to remove all lines with comments from vba_code before running the regex.


Affected tool: olevba and mraptor (maybe others as well that I haven't used)

Describe the bug Suspicious keywords (eg. "create") in the comments are causing false positives

File/Malware sample to reproduce the bug

Sub test()
    'I love to create
    MsgBox "Hello world"
End Sub

How To Reproduce the bug run olevba on the sample

Expected behavior No threat detected