flexpaper / pdf2json

PDF2JSON is a conversion library based on XPDF (3.02) which can be used for high performance PDF page by page conversion to JSON and XML format. It also supports compressing data to minimize size. PDF2JSON is available for Windows, OSX and Linux. Please see https://flowpaper.com for more information
305 stars 52 forks source link

Segmentation fault from Object.h #42

Open sihyungyou opened 4 years ago

sihyungyou commented 4 years ago

Hi, I found that pdf2json (commit b671b64) crashes with the attached file (pdf2json_crash.pdf) which has syntax errors. The crash was observed on Ubuntu 18.04.3 with kernel 4.15.0-72-generic and x86_64. The crash can be reproduced by the following command:

$ pdf2json pdf2json_crash.pdf

Here’s the the crash stack trace taken with GDB:

#0  0x00007ffff6e6930e in _int_malloc (av=av@entry=0x7ffff71c0c40 <main_arena>, bytes=bytes@entry=4) at malloc.c:3557
#1  0x00007ffff6e6c0fc in __GI___libc_malloc (bytes=4) at malloc.c:3057
#2  0x0000555555849e67 in gmalloc (size=4) at gmem.cc:97
#3  0x000055555584b009 in copyString (s=0x555556d6e0d4 "obj") at gmem.cc:261
#4  0x000055555582751d in Lexer::getObj (this=0x555556d6e0b0, obj=0x555556d6e088) at ./Object.h:103
#5  0x000055555582d8f7 in Parser::shift (this=0x555556d6e060) at Parser.cc:226
#6  0x000055555582bfa2 in Parser::getObj (this=0x555556d6e060, obj=0x7fffff7ff2a8, fileKey=0x0, encAlgorithm=cryptRC4, keyLength=0,
    objNum=0, objGen=0) at Parser.cc:108
#7  0x00005555556b6a99 in XRef::fetch (this=<optimized out>, num=5, gen=0, obj=0x7fffff7ff400) at XRef.cc:811
#8  0x0000555555653927 in Object::fetch (this=<optimized out>, xref=0x555555c708f0, obj=0x7fffff7ff400) at Object.cc:106
#9  0x00005555555c7eaa in Dict::lookup (this=0x555556d6dee0, key=0x5555558f21c1 "Length", obj=0x7fffff7ff400) at Dict.cc:76
#10 0x000055555582db60 in Object::dictLookup (this=<optimized out>, key=0x4 <error: Cannot access memory at address 0x4>,
    obj=0x55550000000d) at ./Object.h:253

This crash was found with Angora fuzzer, and pdf2json_crash is originated from sample pdf file dummy.pdf which is also attached.

Hope this help.

pdf2json_crash.pdf dummy.pdf