decalage2 / oletools

oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.
http://www.decalage.info/python/oletools
Other
2.83k stars 561 forks source link

Extract HTTP(S) URLs in macros and perform lookups against (configurable) DNSWLs and DNSBLs #510

Open robert-scheck opened 4 years ago

robert-scheck commented 4 years ago

I would like to raise a feature request to extract HTTP and HTTPS URLs (and ideally IP addresses) in macros and perform lookups against (configurable) DNSWLs and DNSBLs.

As of writing https://github.com/decalage2/oletools/blob/master/oletools/olevba.py#L831 seems already to contain some interesting extractions, however they're partially disabled and don't do any DNSBL lookups.

For the beginning, an easy way could be to extract only something like https?://([a-zA-Z0-9\.\-_]+\.[a-zA-Z0-9\-]{2,})(?:[/,\s]|\. ) (sloppy PCRE) and only use $1 for something like a DNS lookup against $1.dbl.spamhaus.org. Note that the DNSBL usage itself should be configurable, because there are different DNSBLs, they have different policies for listing domains and they also have different usage policies (some are non-commercial or need a paid subscription). In order to be able to whitelist e.g. company-internal domains, e.g. intranet.example.local which could lead to false-positives with DNSBLs due to a reserved TLD, an optional, but configurable DNSWL lookup before would be great.

Note that extracted IPv4 addresses (rather domains/URLs) have usually a different lookup schema and different DNSBLs.

decalage2 commented 4 years ago

Well, normally olevba should detect and report all URLs. If some are not detected, maybe there's an issue with the current regex (URL_RE defined at https://github.com/decalage2/oletools/blob/master/oletools/olevba.py#L807). Do you have specific samples or examples of URLs that are not matched by olevba?

As for doing online checks, so far I did not implement any in oletools because that might end up in a lot of code to maintain + config files to store API keys, and break the current model that oletools are "simple" CLI tools that don't do network connections. And there are tons of other tools that do it very well already. So far the model is that oletools do static analysis, and the output can then be processed in other tools for various tasks like CTI collection, IOC matching, online checks and enrichment, etc. However if you have some sample code to do the lookups that you propose and it's simple enough, then I might add an option for it.

robert-scheck commented 4 years ago

Could the olevba command line tool be extended to provide a list of all discovered URLs? Or did I overlook this somewhere?

r3comp1le commented 4 years ago

Could the olevba command line tool be extended to provide a list of all discovered URLs? Or did I overlook this somewhere?

its already done, via regex

decalage2 commented 4 years ago

Yes, normally all the URLs that appear in clear text in the VBA code are shown by the command line tool in the results table, tagged as IOCs. They are also returned by the API when using VBA_Parser.analyze_macros(): https://github.com/decalage2/oletools/wiki/olevba#analyze-vba-source-code Do you have a sample where this does not work as expected?