empira / PDFsharp

PDFsharp and MigraDoc Foundation for .NET 6 and .NET Framework
https://docs.pdfsharp.net/
Other
398 stars 91 forks source link

Use lazy loading for object-streams and their objects #85

Open packdat opened 5 months ago

packdat commented 5 months ago

This PR attempts to resolve the issues described in #73 and #46 in a more generic way. It also supersedes #53 by removing the need to handle objects stored in object-streams in a special way.

The "lazy loading" aspect is handled by the new class PdfReferenceToCompressedObject, which is a sub-class of PdfReference. While processing the document's xref-streams, references to objects stored in object-streams are collected in the form of the mentioned PdfReferenceToCompressedObject. When accessing the Value of such a reference (which may occur while parsing another object which contains a reference to the compressed object), the object-stream is loaded and decrypted (if not already done) and the actual object is read from the object-stream.

Have not found any issue so far running automated tests with these changes against ~1000 PDF-files (testing page-import).

Note: The PR also includes some minor tweaks not directly related to object-loading, which i think are helpful. (like reporting the position within a document where an unexpected token was encountered during parsing)