galkahana / HummusJS

Node.js module for high performance creation, modification and parsing of PDF files and streams
http://www.pdfhummus.com
Other
1.15k stars 170 forks source link

Text and images not copied #236

Open maodes opened 6 years ago

maodes commented 6 years ago

Hi Gal,

I am trying to copy a pdf and upload it. The code is working for some pdfs In some pdfs, it copies only lines (horizontal and column lines) and the text and images are not included. I'm not sure if the appendPDFPageFromPDF has issues or maybe the way we implement the pdf copy or the pdf that was being copied is the issue.

This function is for copying the pdf. I omitted the code that we do to the pdf before and after the original pdf has been copied. This is the important one to describe the issue.

const streams = require("memory-streams");
const hummus = require("hummus");

function preparement (payload) {
    const writeStream = new streams.WritableStream();
    const writer = hummus.createWriter(new hummus.PDFStreamForResponse(writeStream));
    const copyingContext = writer.createPDFCopyingContext(new PDFRStreamForBuffer(payload.buffer));

    for (let i=0; i < copyingContext.getSourceDocumentParser().getPagesCount(); i++) {
        copyingContext.appendPDFPageFromPDF(i);
    }

    writer.end();
    writeStream.end();
    return writeStream.toBuffer();
}

This function is for the stream:

function PDFRStreamForBuffer (buffer){
    this.innerArray = Array.prototype.slice.call(buffer, 0)
    this.rposition = 0;
    this.fileSize = buffer.byteLength;
}

PDFRStreamForBuffer.prototype.read = function(inAmount){
    var arr = this.innerArray.slice(this.rposition,this.rposition+inAmount);
    this.rposition += inAmount;
    return arr;
}

PDFRStreamForBuffer.prototype.notEnded = function(){
    return this.rposition < this.fileSize;
}

PDFRStreamForBuffer.prototype.setPosition = function(inPosition){
    this.rposition = inPosition;
}

PDFRStreamForBuffer.prototype.setPositionFromEnd = function(inPosition){
    this.rposition = this.fileSize-inPosition;
}

PDFRStreamForBuffer.prototype.skip = function(inAmount){
    this.rposition += inAmount;
}

PDFRStreamForBuffer.prototype.getCurrentPosition = function(){
    return this.rposition;
}

module.exports = PDFRStreamForBuffer;
chunyenHuang commented 6 years ago

You code works as expected. The only thing is that ...

Gal's reply in #187

The regular page append and merging function of hummus copy only the graphic content of the page without any interactive fields. There are some considerations as to why its done that way. let's just assume that this is a given.

So you will not see any annotation to be copied when using appendPDFPageFromPDF

maodes commented 6 years ago

Hi @chunyenHuang, thanks for the response but unfortunately the content itself, the text and images, are missing and there is no annotation in the pdf. Can I send you via email an image containing the result? It is not totally blank, it only has lines.

chunyenHuang commented 6 years ago

little78926@gmail.com I will see what I can do.