armbues / ioc_parser

Tool to extract indicators of compromise from security reports in PDF format
MIT License
428 stars 171 forks source link

Python 3 compatible #21

Open threatlead opened 8 years ago

threatlead commented 8 years ago

Suggestions:

try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO
def __init__(self, patterns_ini=None, ..., library='pypdf2', ...):
armbues commented 8 years ago

The default PDF library was switched to pdfminer because of the parsing better performance. In a head-to-head test it was able to parse considerably more text from a report set than pypdf2, therefore also generating more IOCs.

An option would be to dynamically check the Python version during runtime and accordingly change the default PDF library.

bernardyim commented 7 years ago

For anyone with issues with pdfminer on python3, consider using pdfminer.six, a fork for compatibility with python3 https://github.com/pdfminer/pdfminer.six

Also, as a totally unrelated side-note (no idea where to put this), you might want to set the re.compile flag to IGNORECASE, so that you can catch cases that are typed in all caps, at parser.py line 133: ind_regex = re.compile(ind_pattern, flags=re.IGNORECASE)

fhightower commented 6 years ago

As far as IGNORECASE support is concerned, this is handled with #34.