I created a pdfparser in golang that does everything the existing pdfparser does and much much more, plus its like 30x faster. Details on it can be found here
Usage:
pdfparser -f input.pdf output/
The above command creates the following files in the output dir:
commands.txt - list of commands run by launch actions
contents.txt - the text content of the pdf (can be scripts and contain urls etc.)
errors.txt - list of format errors and abnormalities that we might be able to detect on
files.txt - list of md5 hash and path of referenced embedded and external files. Embedded files are extracted to the output dir using the md5 as the file name.
javascript.js - javascript of all actions in the pdf
raw.pdf - a decrypted and decoded version of the pdf
urls.txt - list of urls referenced by actions
We should create an ace module that scans all the above files with appropriate yara rules. We may also want to add some of the info in the above files as observables, like embedded files, file paths, urls etc
I created a pdfparser in golang that does everything the existing pdfparser does and much much more, plus its like 30x faster. Details on it can be found here
Usage:
The above command creates the following files in the output dir:
We should create an ace module that scans all the above files with appropriate yara rules. We may also want to add some of the info in the above files as observables, like embedded files, file paths, urls etc