SimpleApp / PDFParser

Swift PDFParser for PDF parsing and text mining. Includes a TrueType font parser
37 stars 10 forks source link

Crashes when reading some PDF documents #1

Closed KiwiWilkinson closed 6 years ago

KiwiWilkinson commented 6 years ago

This looks like a great project, thanks.

The attached PDF page causes this issue Astro Boy-1-1.pdf

When initializing PDFFontDescriptor, TrueTypeFontFile.readCMapSubTableFormat4 variable numberOfBytesLeft is a large negative number. This then terminates the app with

Fatal error: Can't form Range with upperBound < lowerBound

in TrueTypeFontFileReader.getArray()

KiwiWilkinson commented 6 years ago

This is caused by a bug in TrueTypeFontFile.swift. In readCMAPTable it makes a call to readCMapSubTableFormat4. The parameter tableBeginOffset: is intended to be the offset of the CMAP subtable within the stream, as within readCMapSubTableFormat4 it is compared to the stream offset from r.tell().

However the value passed is just the offset of the subtable with the CMAP table (not within the stream) causing the crash.

To fix, amend the code:

image

to be:

image

and it will be sweet as