Hyperlinks from merged PDFs are lost

galkahana / HummusJS

Node.js module for high performance creation, modification and parsing of PDF files and streams

http://www.pdfhummus.com

Other

1.15k stars 170 forks source link

Hyperlinks from merged PDFs are lost #193

Closed astefanutti closed 7 years ago

astefanutti commented 7 years ago

While merging PDF documents that contain hyperlinks with the appendPDFPagesFromPDF API, it seems these hyperlinks are not kept in the merged PDF document. Am I missing some configuration?

galkahana commented 7 years ago

right. merging import only the graphics of a page, and so all interactive content is lost. you will need to extend its capability to do that, follow this recipe (just use the code, cause comments and links are both "annotations" and the code imports all "annotations"): https://github.com/galkahana/HummusJSSamples/blob/master/appending-pages-with-comments/appendWithComments.js

astefanutti commented 7 years ago

Thanks for the quick reply! Let me try that.

astefanutti commented 7 years ago

I've managed to have it working in astefanutti/decktape@2478f83798769dab66b0b245a2675a61607af228. Hopefully I've done it right 😉.

I need to dig into the HummusJS API to take full advantage of it. In DeckTape, I'd like to shrink merge PDF size by reusing repeating background image references instead of duplicating the graphics for each slide. It'd be awesome if I could implement it using HummusJS.

Anyway, thanks for setting me on the right track 👍

galkahana commented 7 years ago

you wanna look into this then.

astefanutti commented 7 years ago

@galkahana thanks a lot for the direction.

I've managed to implement duplicated images factorization while merging PDFs here: https://github.com/astefanutti/decktape/blob/f9d521536e2273e5a696de0db5ed4383d115b375/decktape.js#L346.

It turns out that merging PDFs that embeds duplicated font definitions face the same issue. So I'd like to implement the same sort of strategy though it seems it may be trickier. I've detected duplicated images by comparing digests of the undecoded streams. But for font definitions, I fear it'd be to simplistic as they may contain glyphs subsets.

Would you mind if I ask if you would have some directions to share on factorising embedded font definitions?