Closed k-karakatsanis closed 1 year ago
Hi @k-karakatsanis,
This issue should now be resolved. You can update this functionality in your script by doing something similar to this:
import iocextract
def extract_base64_urls():
encoded_msg = "aHR0cHM6Ly9nb29nbGUuY29tIGh0dHBzOi8vYW1hem9uLmNvbSBodHRwczovL21pY3Jvc29mdC5jb20gaHR0cDovL2dvb2dsZS5jb20gaHR0cDovL2FtYXpvbi5jb20gaHR0cDovL21pY3Jvc29mdC5jb20g"
for url in iocextract.extract_urls(encoded_msg, refang=True, delimiter="space"):
print(url)
extract_base64_urls()
Output:
https://google.com https://amazon.com https://microsoft.com http://google.com http://amazon.com http://microsoft.com
You may notice the new parameter: delimiter
. This is what will switch functionality to parsing URLs inside base64 encoded strings with a whitespace delimiter. While this fix is present on the development branch, I haven't pushed a new package to PyPI yet, so I'll send a final comment on this issue once that package is available for you to use.
The new PyPI package is now available!
PyPI: https://pypi.org/project/iocextract/1.13.8/ GitHub Releases: https://github.com/InQuest/python-iocextract/releases/tag/v1.13.8
Currently, it seems like iocextract extracts only the first URL found in a base64 encoded string.
For example for the following string (original):
'https://google.com https://amazon.com https://microsoft.com http://google.com http://amazon.com http://microsoft.com'
the base64 encoded string is:'aHR0cHM6Ly9nb29nbGUuY29tIGh0dHBzOi8vYW1hem9uLmNvbSBodHRwczovL21pY3Jvc29mdC5jb20gaHR0cDovL2dvb2dsZS5jb20gaHR0cDovL2FtYXpvbi5jb20gaHR0cDovL21pY3Jvc29mdC5jb20g'
and only the first found URL is returned.If I change the sequence of the URLs in the original string and then encode it with base 64, iocextract will return the URL that occurs first this time.
Can you please fix this and return all the URLs existing in a base64 encoded string?