armbues / ioc_parser

Tool to extract indicators of compromise from security reports in PDF format
MIT License
428 stars 171 forks source link

error 'ascii' codec can't encode character u'\u0160' in position 24: ordinal not in range(128) #4

Closed r3comp1le closed 9 years ago

r3comp1le commented 9 years ago

http://www.trendmicro.co.uk/media/wp/operation-arid-viper-whitepaper-en.pdf

buffer commented 9 years ago

I confirm the issue but just with Python 2. I spent some time porting the code to Python 3 a few days ago and the issue does not exist under Python 3.

armbues commented 9 years ago

This is an error related to PDF parsing with the pypdf2 library using Python 2. With the support of pdfminer there is an alternative that seems to be more robust in parsing PDF reports, e.g. the report mentioned is parsed w/o errors.