Closed IV1T3 closed 2 years ago
Interesting PDF analyzation tool: https://blog.didierstevens.com/programs/pdf-tools/ Also used as forensic package in Kali: https://tools.kali.org/forensics/pdfid Pypi package usable as a library: https://github.com/mlodic/pdfid
Example Output of pdfid.py
Release 1.0.4 (2021/03/25):
{
'/AA': 0,
'/AcroForm': 0,
'/Colors > 2^24': 0,
'/EmbeddedFile': 0,
'/Encrypt': 0,
'/JBIG2Decode': 0,
'/JS': 0,
'/JavaScript': 0,
'/Launch': 0,
'/ObjStm': 0,
'/OpenAction': 0,
'/Page': 36,
'/RichMedia': 0,
'/XFA': 0,
'endobj': 469,
'endstream': 186,
'filename': 'analyzing.pdf',
'header': '%PDF-1.4',
'obj': 469,
'startxref': 1,
'stream': 186,
'trailer': 1,
'version': '0.2.7',
'xref': 1
}
The output description is as following:
Almost every PDF documents will contain the first 7 words (obj through startxref), and to a lesser extent stream and endstream. I’ve found a couple of PDF documents without xref or trailer, but these are rare (BTW, this is not an indication of a malicious PDF document).
/Page gives an indication of the number of pages in the PDF document. Most malicious PDF document have only one page.
/Encrypt indicates that the PDF document has DRM or needs a password to be read.
/ObjStm counts the number of object streams. An object stream is a stream object that can contain other objects, and can therefor be used to obfuscate objects (by using different filters).
/JS and /JavaScript indicate that the PDF document contains JavaScript. Almost all malicious PDF documents that I’ve found in the wild contain JavaScript (to exploit a JavaScript vulnerability and/or to execute a heap spray). Of course, you can also find JavaScript in PDF documents without malicious intend.
/AA and /OpenAction indicate an automatic action to be performed when the page/document is viewed. All malicious PDF documents with JavaScript I’ve seen in the wild had an automatic action to launch the JavaScript without user interaction.
The combination of automatic action and JavaScript makes a PDF document very suspicious.
/JBIG2Decode indicates if the PDF document uses JBIG2 compression. This is not necessarily and indication of a malicious PDF document, but requires further investigation.
/RichMedia is for embedded Flash.
/Launch counts launch actions.
/XFA is for XML Forms Architecture.
A number that appears between parentheses after the counter represents the number of obfuscated occurrences. For example, /JBIG2Decode 1(1) tells you that the PDF document contains the name /JBIG2Decode and that it was obfuscated (using hexcodes, e.g. /JBIG#32Decode).
BTW, all the counters can be skewed if the PDF document is saved with incremental updates.
Because PDFiD is just a string scanner (supporting name obfuscation), it will also generate false positives. For example, a simple text file starting with %PDF-1.1 and containing words from the list will also be identified as a PDF document.
Submitted PR to extend functionality for DMF: https://github.com/mlodic/pdfid/pull/3
Add more to PDF M score:
/JS 0 #This indicates the presence of Javascript /JavaScript 0 #This indicates the presence of Javascript /AA 0 #This indicates the presence of automatic action on opening /OpenAction 0 #This indicates the presence of automatic action on opening /AcroForm 0 #This indicates the presence of AcroForm which could contain JavaScript /JBIG2Decode 0 #This indicates the use of JBIG2 compression which could be used for obfuscating content /RichMedia 0 #This indicates the presence of rich media within the PDF such as Flash /Launch 0 #This counts the launch actions /EmbeddedFile 0 #This indicates there are embedded files within the PDF /XFA 0 #This indicates the presence of XML Forms within the PDF
ToDO:
Just created a new PR for https://github.com/mlodic/pdfid/pull/4.
This feature will allow to sanitize given PDFs in memory.
O-checker could provide a good starting point. Especially,
pdfanalysis.py
illustrates how to detect specific malware inside a PDF.GitHub: https://github.com/yotsubo/o-checker
Presentation: https://www.blackhat.com/docs/us-16/materials/us-16-Otsubo-O-checker-Detection-of-Malicious-Documents-through-Deviation-from-File-Format-Specifications.pdf
Whitepaper: https://www.blackhat.com/docs/us-16/materials/us-16-Otsubo-O-checker-Detection-of-Malicious-Documents-through-Deviation-from-File-Format-Specifications-wp.pdf