InQuest / iocextract

Defanged Indicator of Compromise (IOC) Extractor.
https://inquest.readthedocs.io/projects/iocextract/
GNU General Public License v2.0
505 stars 91 forks source link

Can't decode url throw an error #39

Closed myugan closed 4 years ago

myugan commented 4 years ago
Traceback (most recent call last):
  File "extract.py", line 18, in <module>
    for i in iocextract.extract_encoded_urls(f.read(), refang=True):
  File "/usr/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 174: invalid start byte

I create an simple python script to find url on current directory with iocextract, but throw an error when using extract_encoded_urls

myugan commented 4 years ago

Solved with this options:

with open(path,"r",  encoding="utf8", errors='ignore') as f:

References: https://stackoverflow.com/questions/42339876/error-unicodedecodeerror-utf-8-codec-cant-decode-byte-0xff-in-position-0-in