flexpaper / pdf2json

PDF2JSON is a conversion library based on XPDF (3.02) which can be used for high performance PDF page by page conversion to JSON and XML format. It also supports compressing data to minimize size. PDF2JSON is available for Windows, OSX and Linux. Please see https://flowpaper.com for more information
305 stars 52 forks source link

memory leaks in GString::copy()/Page::getLinks/GString::resize #52

Open guangbuming opened 7 months ago

guangbuming commented 7 months ago

project

https://github.com/flexpaper/pdf2json version: 0.70

os info

Ubuntu20.04 TLS

poc

poc.zip

build

git https://github.com/flexpaper/pdf2json.git
cd pdf2json
./configure

edit pdf2json/src/Makefile as follows

SHELL = /bin/sh

SRCDIR = .
XPDFSRCDIR = ../xpdf
XPDFLIBDIR = ../xpdf
GOOSRCDIR = ../goo
GOOLIBDIR = ../goo
FOFISRCDIR = ../fofi
FOFILIBDIR = ../fofi
SPLASHSRCDIR = ../splash
SPLASHLIBDIR = ../splash

CXXFLAGS = -I/usr/local/include -g -O2 -fsanitize=address -fno-omit-frame-pointer -DHAVE_CONFIG_H -DHAVE_DIRENT_H=1  -I.. -DHAVE_REWINDDIR=1 -DHAVE_POPEN=1 -I.. -I$(GOOSRCDIR) -I$(XPDFSRCDIR) -I$(FOFISRCDIR) -I$(SPLASHSRCDIR) -I$(srcdir)           -I/usr/X11R6/include

LDFLAGS =
FTLIBS =

OTHERLIBS =

CXX ?= c++

LIBPREFIX = lib
EXE =

#------------------------------------------------------------------------

.SUFFIXES: .cc

.cc.o:
    $(CXX) $(CXXFLAGS) -c $<

#------------------------------------------------------------------------

CXX_SRC = \
    $(SRCDIR)/pdf2json.cc \
    $(SRCDIR)/ImgOutputDev.cc \
    $(SRCDIR)/XmlFonts.cc \
    $(SRCDIR)/XmlLinks.cc

#------------------------------------------------------------------------

all: pdf2json$(EXE)

#-------------------------------------------------------------------------

PDF2JSON_OBJS = ImgOutputDev.o XmlFonts.o XmlLinks.o \
    pdf2json.o
PDF2JSON_LIBS = -L$(GOOLIBDIR) -L$(FOFILIBDIR) -L$(SPLASHLIBDIR) $(FTLIBS) -L$(XPDFLIBDIR) $(OTHERLIBS) -lXpdf -lGoo -lfofi -lsplash -lm

pdf2json$(EXE): $(PDF2JSON_OBJS) $(GOOLIBDIR)/$(LIBPREFIX)Goo.a
    $(CXX) $(CXXFLAGS) $(LDFLAGS) -o pdf2json$(EXE) $(PDF2JSON_OBJS) \
        $(PDF2JSON_LIBS)

#-------------------------------------------------------------------------
PDF2JSON_WINOBJS = pdf2json.exe ImgOutPutDev.obj  pdf2json.obj  XmlFonts.obj  XmlLinks.obj

clean:
    rm -f $(PDF2JSON_OBJS) pdf2json$(EXE)
    rm -f $(PDF2JSON_WINOBJS)

#------------------------------------------------------------------------

distdepend:
    cp Makefile.in Makefile.in.bak
    sed '/^#----- dependences -----/q' Makefile.in.bak >Makefile.in
    $(CXX) $(CXXFLAGS) -MM $(CXX_SRC) >>Makefile.in

to pdf2json dir, make!

make

Info

Error: PDF file is damaged - attempting to reconstruct xref table...
Error (5457): Dictionary key must be a name object
Error (5459): Dictionary key must be a name object
Error (5463): Dictionary key must be a name object
Error (5466): Dictionary key must be a name object
Error (5472): Dictionary key must be a name object
Error (4932): Dictionary key must be a name object
Error (4934): Dictionary key must be a name object
Error (4938): Dictionary key must be a name object
Error (4942): Dictionary key must be a name object
Error (4943): Dictionary key must be a name object
Error (4950): Dictionary key must be a name object
Page-1

=================================================================
==2270915==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x4c736d in operator new(unsigned long) (/home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/src/pdf2json+0x4c736d)
    #1 0x53b9b4 in GString::copy() /home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/xpdf/./../goo/GString.h:41:28
    #2 0x53b9b4 in GlobalParams::getTextEncodingName() /home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/xpdf/GlobalParams.cc:2256:21
    #3 0x4d0b77 in ImgOutputDev::ImgOutputDev(char*, char*, char*, char*, char*, char*, char*, int, int, int, int, int, int, int) /home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/src/ImgOutputDev.cc:864:52
    #4 0x4dd020 in main /home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/src/pdf2json.cc:241:17
    #5 0x7f63af21d082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x4c736d in operator new(unsigned long) (/home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/src/pdf2json+0x4c736d)
    #1 0x54c541 in Page::getLinks(Catalog*) /home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/xpdf/Page.cc:254:11
    #2 0x550190 in PDFDoc::getLinks(int) /home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/xpdf/PDFDoc.cc:351:34
    #3 0x550190 in PDFDoc::displayPage(OutputDev*, int, double, double, int, int, int, int, int (*)(void*), void*) /home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/xpdf/PDFDoc.cc:320:34
    #4 0x550190 in PDFDoc::displayPages(OutputDev*, int, int, double, double, int, int, int, int, int (*)(void*), void*) /home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/xpdf/PDFDoc.cc:332:5
    #5 0x4dd214 in main /home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/src/pdf2json.cc:275:10
    #6 0x7f63af21d082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16

Indirect leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x4c747d in operator new[](unsigned long) (/home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/src/pdf2json+0x4c747d)
    #1 0x69f3f2 in GString::resize(int) /home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/goo/GString.cc:87:9
    #2 0x69f3f2 in GString::GString(GString*) /home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/goo/GString.cc:131:3
    #3 0x4d0b77 in ImgOutputDev::ImgOutputDev(char*, char*, char*, char*, char*, char*, char*, int, int, int, int, int, int, int) /home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/src/ImgOutputDev.cc:864:52
    #4 0x4dd020 in main /home/ubuntu/fuzz/pdf2json_fuzz/pdf2json/src/pdf2json.cc:241:17
    #5 0x7f63af21d082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16