InQuest / iocextract

Defanged Indicator of Compromise (IOC) Extractor.
https://inquest.readthedocs.io/projects/iocextract/
GNU General Public License v2.0
498 stars 91 forks source link

Extracting URLs that have been base64 encoded #47

Closed k-karakatsanis closed 1 year ago

k-karakatsanis commented 3 years ago

Currently, it seems like iocextract extracts only the first URL found in a base64 encoded string.

For example for the following string (original): 'https://google.com https://amazon.com https://microsoft.com http://google.com http://amazon.com http://microsoft.com' the base64 encoded string is: 'aHR0cHM6Ly9nb29nbGUuY29tIGh0dHBzOi8vYW1hem9uLmNvbSBodHRwczovL21pY3Jvc29mdC5jb20gaHR0cDovL2dvb2dsZS5jb20gaHR0cDovL2FtYXpvbi5jb20gaHR0cDovL21pY3Jvc29mdC5jb20g' and only the first found URL is returned.

If I change the sequence of the URLs in the original string and then encode it with base 64, iocextract will return the URL that occurs first this time.

Can you please fix this and return all the URLs existing in a base64 encoded string?

battleoverflow commented 1 year ago

Hi @k-karakatsanis,

This issue should now be resolved. You can update this functionality in your script by doing something similar to this:

import iocextract

def extract_base64_urls():
    encoded_msg = "aHR0cHM6Ly9nb29nbGUuY29tIGh0dHBzOi8vYW1hem9uLmNvbSBodHRwczovL21pY3Jvc29mdC5jb20gaHR0cDovL2dvb2dsZS5jb20gaHR0cDovL2FtYXpvbi5jb20gaHR0cDovL21pY3Jvc29mdC5jb20g"

    for url in iocextract.extract_urls(encoded_msg, refang=True, delimiter="space"):
        print(url)

extract_base64_urls()

Output:

https://google.com https://amazon.com https://microsoft.com http://google.com http://amazon.com http://microsoft.com 

You may notice the new parameter: delimiter. This is what will switch functionality to parsing URLs inside base64 encoded strings with a whitespace delimiter. While this fix is present on the development branch, I haven't pushed a new package to PyPI yet, so I'll send a final comment on this issue once that package is available for you to use.

battleoverflow commented 1 year ago

The new PyPI package is now available!

PyPI: https://pypi.org/project/iocextract/1.13.8/ GitHub Releases: https://github.com/InQuest/python-iocextract/releases/tag/v1.13.8