galkahana / HummusJS

Node.js module for high performance creation, modification and parsing of PDF files and streams
http://www.pdfhummus.com
Other
1.14k stars 169 forks source link

Relative links #337

Open larssvensby opened 5 years ago

larssvensby commented 5 years ago

Hi,

I followed #329 and #193 in order to merge multiple PDF files with relative links.

It works so far as the relative hyper-links becomes clickable in the output PDF (in Acrobat Reader), but clicking the link doesn't navigate to the target.

Is it possible to keep/preserve the relative hyper-links and their targets during the merging process?

Any help appreciated. Thanks!

galkahana commented 5 years ago

You need to figure out the relevant pdf constructs for targets and make sure they get copied to.

larssvensby commented 5 years ago

Thanks for the reply! Any hints how to do that?

The targets are targets (clicking a hyper-link navigates to the target) in the original pdf files but I can't find anything in the documentation how to extract them or copy them over.

galkahana commented 5 years ago

if you got a sample pdf that you want to copy and retain its targets (and you can specify which...so im positive that you understand what im looking at), i may be able to find the time to figure this out.

larssvensby commented 5 years ago

Thank you for looking into this!

I attached a simple pdf with 3 relative links from the Table of contents, if I click on them in Acrobat Reader it navigates correctly. After I merge with (any) other pdf the targets are lost.

Test case Report.pdf

galkahana commented 5 years ago

OK. so read a bit about linked destinations in the PDF ref (8.4.5 about link annotations, which leads to 8.2.1 about destinations...if you follow should give you the relevant theory to build to code).

Turns out the extra needed, to make this work, is to copy the Dests dictionary from the catalog. Also, since this PDF has the annotations embedded and not reffed, the original code fails. I corrected the code and simplified it.

The following is a script that you can use to solve your problem:

var hummus = require('hummus');

var resultPath = './Test.case.Report.out.pdf';
var sourcePath = './Test.case.Report.pdf';

var pdfWriter = hummus.createWriter(resultPath);

// original method, append with regulat method. which should copy without comments
// pdfWriter.appendPDFPagesFromPDF(sourcePath);

// second, with the special method. this will copy the pages with the comments
appendPDFPageFromPDFWithAnnotations(pdfWriter,sourcePath);

pdfWriter.end();

function appendPDFPageFromPDFWithAnnotations(pdfWriter,sourcePDFPath) {
    var objCxt = pdfWriter.getObjectsContext();
    var cpyCxt = pdfWriter.createPDFCopyingContext(sourcePDFPath);
    var cpyCxtParser = cpyCxt.getSourceDocumentParser();

    // for each page
    for(var i=0;i<cpyCxtParser.getPagesCount();++i) {
        // grab page dictionary
        var pageDictionary = cpyCxtParser.parsePageDictionary(i);
        if(!pageDictionary.exists('Annots')) {
            // no annotation. append as is
            cpyCxt.appendPDFPageFromPDF(i);            
        }
        else {
            // New: i'm making this code simpler and also something that will work with embedded annotations
            var reffedObjects;

            pdfWriter.getEvents().once('OnPageWrite',function(params) {
                // using the page write event, write the new annotations
                params.pageDictionaryContext.writeKey('Annots');
                reffedObjects = cpyCxt.copyDirectObjectWithDeepCopy(pageDictionary.queryObject('Annots'))
            })   

            // write page. this will trigger the event  
            cpyCxt.appendPDFPageFromPDF(i);

            // now write the reffed object (should be populated)
            if(reffedObjects && reffedObjects.length > 0)
                cpyCxt.copyNewObjectsForDirectObject(reffedObjects)
        }

    }

    // New, for linked destinations: for links that point to destinations also copy the original document dests catalog object.
    // Per definition of the catalog, this has to be an indirect object. Since we ref it from the catalog which will be written only
    // when the document is ended, we need to copy the dictionary in advance, and then just refer to it.
    var catalogDict = cpyCxtParser.queryDictionaryObject(cpyCxtParser.getTrailer(),'Root').toPDFDictionary();

    if(catalogDict.exists('Dests')) {
       // original document has a dests dict. so copy it.
       var destsReference = catalogDict.queryObject('Dests').toPDFIndirectObjectReference()
       var targetDestsId = cpyCxt.copyObject(destsReference.getObjectID())

       // now add handler for catalog writing, which will launch when 'pdfWriter.end()' is called
       // to add a reference to the copy
       pdfWriter.getEvents().once('OnCatalogWrite',function(params) {
            var newCatalogDict = params.catalogDictionaryContext;
            newCatalogDict.writeKey('Dests')
            newCatalogDict.writeObjectReferenceValue(targetDestsId)
       })

    }

}