Open robert-scheck opened 4 years ago
Well, normally olevba should detect and report all URLs. If some are not detected, maybe there's an issue with the current regex (URL_RE defined at https://github.com/decalage2/oletools/blob/master/oletools/olevba.py#L807). Do you have specific samples or examples of URLs that are not matched by olevba?
As for doing online checks, so far I did not implement any in oletools because that might end up in a lot of code to maintain + config files to store API keys, and break the current model that oletools are "simple" CLI tools that don't do network connections. And there are tons of other tools that do it very well already. So far the model is that oletools do static analysis, and the output can then be processed in other tools for various tasks like CTI collection, IOC matching, online checks and enrichment, etc. However if you have some sample code to do the lookups that you propose and it's simple enough, then I might add an option for it.
Could the olevba
command line tool be extended to provide a list of all discovered URLs? Or did I overlook this somewhere?
Could the
olevba
command line tool be extended to provide a list of all discovered URLs? Or did I overlook this somewhere?
its already done, via regex
Yes, normally all the URLs that appear in clear text in the VBA code are shown by the command line tool in the results table, tagged as IOCs. They are also returned by the API when using VBA_Parser.analyze_macros(): https://github.com/decalage2/oletools/wiki/olevba#analyze-vba-source-code Do you have a sample where this does not work as expected?
I would like to raise a feature request to extract HTTP and HTTPS URLs (and ideally IP addresses) in macros and perform lookups against (configurable) DNSWLs and DNSBLs.
As of writing https://github.com/decalage2/oletools/blob/master/oletools/olevba.py#L831 seems already to contain some interesting extractions, however they're partially disabled and don't do any DNSBL lookups.
For the beginning, an easy way could be to extract only something like
https?://([a-zA-Z0-9\.\-_]+\.[a-zA-Z0-9\-]{2,})(?:[/,\s]|\. )
(sloppy PCRE) and only use$1
for something like a DNS lookup against$1.dbl.spamhaus.org
. Note that the DNSBL usage itself should be configurable, because there are different DNSBLs, they have different policies for listing domains and they also have different usage policies (some are non-commercial or need a paid subscription). In order to be able to whitelist e.g. company-internal domains, e.g.intranet.example.local
which could lead to false-positives with DNSBLs due to a reserved TLD, an optional, but configurable DNSWL lookup before would be great.Note that extracted IPv4 addresses (rather domains/URLs) have usually a different lookup schema and different DNSBLs.