InQuest / iocextract

Defanged Indicator of Compromise (IOC) Extractor.
https://inquest.readthedocs.io/projects/iocextract/
GNU General Public License v2.0
505 stars 91 forks source link

Exception with some unicode in URLs #8

Closed rshipp closed 6 years ago

rshipp commented 6 years ago
Traceback (most recent call last):
  File "iocextract", line 11, in <module>
    sys.exit(main())
  File "local/lib/python2.7/site-packages/iocextract.py", line 433, in main
    for ioc in extract_urls(args.input.read(), refang=args.refang, strip=args.strip_urls):
  File "local/lib/python2.7/site-packages/iocextract.py", line 155, in extract_urls
    url = refang_url(url.group(1))
  File "local/lib/python2.7/site-packages/iocextract.py", line 395, in refang_url
    return parsed.geturl()
  File "/usr/lib64/python2.7/urlparse.py", line 134, in geturl
    return urlunparse(self)
  File "/usr/lib64/python2.7/urlparse.py", line 231, in urlunparse
    return urlunsplit((scheme, netloc, url, query, fragment))
  File "/usr/lib64/python2.7/urlparse.py", line 242, in urlunsplit
    url = '//' + (netloc or '') + url
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 17: ordinal not in range(128)

Example url:

https://secure.comodo.net/CPS0CU<0:08�6�4�2http://crl.comodoca.com/COMODORSACodeSigningCA.crl0t+h0f0>+0�2http://crt.comodoca.com/COMODORSACodeSigningCA.crt0$+0�http://ocsp.comodoca.com0U0�info@all-media.site0