StackOverflowError for invalid Font entry in Font Dictionary

moritzfl commented 1 year ago

Describe the bug OpenPDF goes into a loop when reading from a corrupt document and the execution ends in a StackOverflowError. The behaviour was observed with 1.3.30.

The erroneous PDF file has an incorrect entry in the Font Dictionary (Reference to Object 15 which is not a valid font).

Erroneous file: wrong-object-type-for-font.pdf

java.lang.StackOverflowError: null
    java.lang.AbstractStringBuilder.append(Unknown Source)
    java.lang.StringBuilder.append(Unknown Source)
    java.lang.StringBuilder.append(Unknown Source)
    java.util.AbstractCollection.toString(Unknown Source)
    com.lowagie.text.pdf.PdfArray.toString(PdfArray.java:208)
    java.lang.String.valueOf(Unknown Source)
    java.lang.StringBuilder.append(Unknown Source)
    com.lowagie.text.pdf.parser.PdfContentReaderTool.getDictionaryDetail(PdfContentReaderTool.java:104)
    com.lowagie.text.pdf.parser.PdfContentReaderTool.getDictionaryDetail(PdfContentReaderTool.java:119)
    com.lowagie.text.pdf.parser.PdfContentReaderTool.getDictionaryDetail(PdfContentReaderTool.java:119)
    com.lowagie.text.pdf.parser.PdfContentReaderTool.getDictionaryDetail(PdfContentReaderTool.java:119)
    com.lowagie.text.pdf.parser.PdfContentReaderTool.getDictionaryDetail(PdfContentReaderTool.java:119)
    ...

To Reproduce Code to reproduce the issue by parsing files content through OpenPDF - for example with PdfContentReaderTool.

            final InputStream pdfreader = // File content
            final PdfReader pdfreader = new PdfReader(in);
            Writer writer = // anything
            for (int pageNum = 1; pageNum <= pdfreader.getNumberOfPages(); pageNum++) {
                PdfContentReaderTool.listContentStreamForPage(pdfreader, pageNum, writer);
            }

Expected behavior An Exception is fine but OpenPDF shouldn't go into a loop and throw an Error like StackOverflowError. Crafted documents could affect the stability of any software using OpenPDF to modify or process PDF documents while not providing information on the source of the error within the document.

Screenshots Screenshot for the faulty Font object in iText RUPS. RUPS-screenshot

System (please complete the following information):

OS: Windows, occurs on Linux as well

Additional context

File for reproduction and testing: wrong-object-type-for-font.pdf

balogun14 commented 8 months ago

@moritzfl i will like to work on this issue

andreasrosdal commented 5 months ago

Pull requests for this is welcome

asturio commented 3 months ago

@balogun14 , @moritzfl Have you found a solution for this problem? Do you have an idea, how the problem can be avoided?

moritzfl commented 3 months ago

We mainly test compatibility with OpenPDF as part of a Preflighting solution and we now also catch StackOverflowErrors (and others) when trying to read with OpenPDF instead of just caring for Exceptions.

So that has fixed the problem for us in production but it does not fix the issue in OpenPDF. I can look into it however - it does not seem to be too complicated.

Without looking at the code, I imagine an easy fix could also be to just check for the expected types in a font dictionary instead of trying to parse "Annot" in a font dictionary. However, I have not made sure that this would prevent recursion from occuring when a "Font"-Object with the same structure as the "Annot"-Object in the example is present.

If @balogun14 is not interested in looking at this issue anymore, I can give it a try. I am familiar enough with the PDF standard(s) and writing code for PDF but I am unfamiliar with OpenPDF code.

mkl-public commented 3 months ago

As a hint: com.lowagie.text.pdf.parser.PdfContentReaderTool.getDictionaryDetail(PdfDictionary, int) recursively builds a String representation of a PdfDictionary. Unfortunately it does not check for circular references (like in the case at hand between the Popup annotation and its parent Stamp annotation), so if there are any, the code recurses infinitely (or until the stack overflows).

As that method only is called by PdfContentReaderTool.listContentStreamForPage, though, the dictionaries it usually comes across are without recursion. Thus, the issue usually doesn't pop up.

One option to fix this is limiting recursion depth. Fortunately, the int parameter already is the recursion depth (used for indention), so you can simply check its value.

Alternatively, you can add a parameter with a set of dictionaries and arrays you already visited and not visit them again.

LibrePDF / OpenPDF

StackOverflowError for invalid Font entry in Font Dictionary #893