empira / PDFsharp-1.5

A .NET library for processing PDF
MIT License
1.28k stars 588 forks source link

Optimize resources while importing pages #105

Open pchinery opened 4 years ago

pchinery commented 4 years ago

We came across a PDF file that was referencing one resource dictionary from every page, which contained all fonts and images. Therefore, extracting a single page would make the resulting file very large, as all fonts and images would be embedded as well. We can provide this file for tests, if desired.

The code changes not treat cloning the resource dictionary differently from cloning other objects, as the resources will be reduced to resources used in the content.

There are a few questions open:

Any feedback is greatly appreciated and we'd love to see this ability in the main branch at some point.