Closed asidorowicz closed 10 months ago
Enhancements to the PdfSmartCopyTests.cs
test class
The testing for this feature has been improved with new and modified test methods. There's a new test method Verify_Remove_Duplicate_Dictionaries_Works
to ensure dictionaries in PDFs don't duplicate. Furthermore, the method Verify_Remove_Duplicate_Objects_Works
is now Verify_Remove_Duplicate_Streams_Works
, aligning its name to its primary function better. The CompressMultiplePdfFilesRemoveDuplicateObjects()
method now provides the functionality to compress many PDF files while avoiding duplication of objects.
Improvements in PdfCopy.cs
and PdfSmartCopy.cs
In the PdfCopy.cs
file, a minor amendment was made in the Equals()
method to improve the comparison between objects. Meanwhile, in PdfSmartCopy.cs', several changes have been made to enhance the removal of duplicates and elegance of the code. Some unnecessary stream handling code in the
CopyIndirect()` method was removed for code cleanliness.
Introduction of ByteStore
class in PdfSmartCopy.cs
A new ByteStore
class has been added to help handle the serialization of indirect references. This class contains a private List
, _references
, to keep track of seen references. This should streamline dealing with references and contribute to the overall reduction of duplicates.
Modification of serObject()
in ByteStore
class
In order to provide better handling of indirect references and improve the computing of the MD5 hash, the serObject()
method in the ByteStore
class was modified. It now handles indirect reference serialization and uses the using
statement for the MD5BouncyCastle
object for better resource management.
While correcting a bug that caused an infinite loop in SmartPdfCopy when handling certain documents (https://github.com/VahidN/iTextSharp.LGPLv2.Core/issues/124), we have lost the ability to remove duplicate dictionaries.
This causes the document to become unnecessarily large when appending multiple documents that are based on the same template (eg: thousands of documents prepared for a print job)
These changes restore that capability, and prevent re-visiting the same nodes when detecting identical content, or already processed nodes, improving performance handling many streams/dictionaries (especially recursive)