InQuest / iocextract

Defanged Indicator of Compromise (IOC) Extractor.
https://inquest.readthedocs.io/projects/iocextract/
GNU General Public License v2.0
505 stars 91 forks source link

Getting Error: binascii.Error: Incorrect padding #78

Open s4ksh1 opened 9 months ago

s4ksh1 commented 9 months ago

My IOC is https://example[.]com/k265/aHR0cHM6Ly91NzAwNy5zY29y

iocextract.extract_urls(IOC, refang=True)

Getting error: File "/usr/local/lib/python3.11/dist-packages/iocextract.py", line 522, in extract_encoded_urls url = base64.b64decode(url).decode("utf-8", "replace") ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/base64.py", line 88, in b64decode return binascii.a2b_base64(s, strict_mode=validate) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ binascii.Error: Incorrect padding

How to ignore base64 strings while extracting URL from content (iocextract.extract_urls)??

image

How to ignore extraction of encoded strings present in URI?? Say in above example I want to ignore extraction of 'aHR0cHM6Ly91NzAwNy5zY29y'

Synse commented 3 months ago

@s4ksh1 iocextract.extract_urls() calls iocextract.extract_unencoded_urls() and iocextract.extract_encoded_urls(). In this case you can just use iocextract.extract_unencoded_urls() directly:

>>> import iocextract
>>> # this extracts all urls, encoded and unencoded
>>> list(iocextract.extract_urls("ioc is hxxps://u7007.scor1[.]com/k265/aHR0cHM6Ly91NzAwNy5zY29y", refang=True))
['https://u7007.scor1.com/k265/aHR0cHM6Ly91NzAwNy5zY29y', 'https://u7007.scor1.com/k265/aHR0cHM6Ly91NzAwNy5zY29y', 'https://u7007.scor']
# this extracts just unencoded urls
>>> list(iocextract.extract_unencoded_urls("ioc is hxxps://u7007.scor1[.]com/k265/aHR0cHM6Ly91NzAwNy5zY29y", refang=True))
['https://u7007.scor1.com/k265/aHR0cHM6Ly91NzAwNy5zY29y', 'https://u7007.scor1.com/k265/aHR0cHM6Ly91NzAwNy5zY29y']
# this extracts just encoded urls
>>> list(iocextract.extract_encoded_urls("ioc is hxxps://u7007.scor1[.]com/k265/aHR0cHM6Ly91NzAwNy5zY29y", refang=True))
['https://u7007.scor']
>>>