We came across a PDF file that was referencing one resource dictionary from every page, which contained all fonts and images. Therefore, extracting a single page would make the resulting file very large, as all fonts and images would be embedded as well. We can provide this file for tests, if desired.
The code changes not treat cloning the resource dictionary differently from cloning other objects, as the resources will be reduced to resources used in the content.
There are a few questions open:
Are there (maybe indirect) ways to reference a resource from the content that are not considered here?
Is there a way to re-use the lexer/parser to go identify used resources? (currently, this is a rather hacky implementation)
Are there any points that we have not considered properly here?
Any feedback is greatly appreciated and we'd love to see this ability in the main branch at some point.
We came across a PDF file that was referencing one resource dictionary from every page, which contained all fonts and images. Therefore, extracting a single page would make the resulting file very large, as all fonts and images would be embedded as well. We can provide this file for tests, if desired.
The code changes not treat cloning the resource dictionary differently from cloning other objects, as the resources will be reduced to resources used in the content.
There are a few questions open:
Any feedback is greatly appreciated and we'd love to see this ability in the main branch at some point.