Open s4ksh1 opened 9 months ago
@s4ksh1 iocextract.extract_urls()
calls iocextract.extract_unencoded_urls()
and iocextract.extract_encoded_urls()
. In this case you can just use iocextract.extract_unencoded_urls()
directly:
>>> import iocextract
>>> # this extracts all urls, encoded and unencoded
>>> list(iocextract.extract_urls("ioc is hxxps://u7007.scor1[.]com/k265/aHR0cHM6Ly91NzAwNy5zY29y", refang=True))
['https://u7007.scor1.com/k265/aHR0cHM6Ly91NzAwNy5zY29y', 'https://u7007.scor1.com/k265/aHR0cHM6Ly91NzAwNy5zY29y', 'https://u7007.scor']
# this extracts just unencoded urls
>>> list(iocextract.extract_unencoded_urls("ioc is hxxps://u7007.scor1[.]com/k265/aHR0cHM6Ly91NzAwNy5zY29y", refang=True))
['https://u7007.scor1.com/k265/aHR0cHM6Ly91NzAwNy5zY29y', 'https://u7007.scor1.com/k265/aHR0cHM6Ly91NzAwNy5zY29y']
# this extracts just encoded urls
>>> list(iocextract.extract_encoded_urls("ioc is hxxps://u7007.scor1[.]com/k265/aHR0cHM6Ly91NzAwNy5zY29y", refang=True))
['https://u7007.scor']
>>>
My IOC is https://example[.]com/k265/aHR0cHM6Ly91NzAwNy5zY29y
iocextract.extract_urls(IOC, refang=True)
Getting error: File "/usr/local/lib/python3.11/dist-packages/iocextract.py", line 522, in extract_encoded_urls url = base64.b64decode(url).decode("utf-8", "replace") ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/base64.py", line 88, in b64decode return binascii.a2b_base64(s, strict_mode=validate) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ binascii.Error: Incorrect padding
How to ignore base64 strings while extracting URL from content (iocextract.extract_urls)??
How to ignore extraction of encoded strings present in URI?? Say in above example I want to ignore extraction of 'aHR0cHM6Ly91NzAwNy5zY29y'