PDF2JSON is a conversion library based on XPDF (3.02) which can be used for high performance PDF page by page conversion to JSON and XML format. It also supports compressing data to minimize size. PDF2JSON is available for Windows, OSX and Linux. Please see https://flowpaper.com for more information
We’ve fuzzed pdf2json with AFL and found some crashes on ObjectStream::getObject.
gdb says :
Error (35014): Dictionary key must be a name object
Error (35021): Dictionary key must be a name object
Program received signal SIGSEGV, Segmentation fault.
0x0000555555665ccc in ObjectStream::getObject (obj=0x555555aacfe8, objNum=67,
objIdx=432, this=0x555555ac9ee0) at XRef.cc:183
183 if (objIdx < 0 || objIdx >= nObjects || objNum != objNums[objIdx]) {
gdb backtrace stack says :
(gdb) bt
#0 0x0000555555665ccc in ObjectStream::getObject (obj=0x555555aacfe8, objNum=67,
objIdx=432, this=0x555555ac9ee0) at XRef.cc:183
#1 XRef::fetch (this=0x555555a9c500, num=67, gen=<optimized out>,
obj=obj@entry=0x555555aacfe8) at XRef.cc:841
#2 0x00005555556092a5 in Object::fetch (this=<optimized out>, xref=<optimized out>,
obj=obj@entry=0x555555aacfe8) at Object.cc:105
#3 0x000055555559831e in Dict::lookup (this=<optimized out>,
key=key@entry=0x55555582a68f "StructTreeRoot", obj=obj@entry=0x555555aacfe8)
at Dict.cc:76
#4 0x00005555555941f6 in Object::dictLookup (this=0x7fffffffe1e0, obj=0x555555aacfe8,
key=0x55555582a68f "StructTreeRoot") at Object.h:253
#5 Catalog::Catalog (this=0x555555aacf90, xrefA=<optimized out>) at Catalog.cc:113
#6 0x0000555555614fb9 in PDFDoc::setup (userPassword=0x0, ownerPassword=<optimized out>,
this=0x555555aa7ba0) at PDFDoc.cc:201
#7 PDFDoc::PDFDoc (this=0x555555aa7ba0, fileNameA=<optimized out>,
ownerPassword=<optimized out>, userPassword=0x0, guiDataA=<optimized out>)
at PDFDoc.cc:101
#8 0x000055555558bdba in main (argc=<optimized out>, argv=0x7fffffffe4b8)
at pdf2json.cc:159
We’ve fuzzed pdf2json with AFL and found some crashes on ObjectStream::getObject.
gdb says :
gdb backtrace stack says :