galkahana / HummusJS

Node.js module for high performance creation, modification and parsing of PDF files and streams
http://www.pdfhummus.com
Other
1.14k stars 169 forks source link

Write a new blank page in the BEGINNING when using hummus.createWriterToModify. #211

Open peter-borgstedt opened 6 years ago

peter-borgstedt commented 6 years ago

Is it possible to write the new page I've created in the beginning? When i write it is added in the end of the pdf.

Must I really create a new writer with its own stream for creating a new blank page and then copying the rest from the other writer and its stream. Its super strange that this is not possible.

Have anyone succeded "INSERTING" content into an existing PDF?

galkahana commented 6 years ago

well, naturally hummus writes pdf at the end. in the same manner that you can't easily write a page and then decide that you want to prepend with a page with regular write, you cant prepend in this scenario.

tell you what though. The pdf pages array is available for modification (grab the catalog and then the pages array is there), and you could use the regular write, and later reopened the result pdf and reorder the pages array. (you'll need to read a little bit about pages arrays in the specs to know how to do that.

an alternative to this (since its annoying to have to do the writing and then the reading) is to avoid using the page addition ability of hummus and rather add teh page object yourself. then you can add teh page object, and recreate the pages array to point to it first and then the rest (its actually quite easy. the new pages array is an array which has your page as the first element and the old pages array as teh second). The annoying part is that you have to build the page object yourself. however, if you build all the graphics into a new xobject form, and then just create a page with mediabox, resources dict that points to this object, and a content stream (that you can write yourself) that places this form, then at least you can use the regular contentcontext for writing the graphics with the xobject.

that's the theory. at least.

peter-borgstedt commented 6 years ago

I would really like some examples here. What approach would you go for? At the moment I have a modifying writer (only). According to the documentation I can get the dictionary for a specific page by using the method parsePageDictionary (with a hummus reader). But I don't know what dictionary you are talking about.

Could you give me an example on how to go about this? And maybe add that to the examples so other may have use of it as well.

This is my initial setup, which works: // This is a npm package called "memory-streams" const writeStream = new streams.WritableStream(); const writer = hummus.createWriterToModify(new PDFRStreamForBuffer(content.buffer), new hummus.PDFStreamForResponse(writeStream));

I would like to have only one writer doing this. To rearrange at the moment have do this in two steps. First I have to create a pure writer, and to rearrange pages I do something like this -- but using an array of index that is ordered in a way I want -- (note that if I'm using a modifying writer I append pages to current modifying content, so this has to be done using a "pure writer" to a new content) :

for (let i = 0; i < copyingContext.getSourceDocumentParser().getPagesCount(); ++i) { copyingContext.appendPDFPageFromPDF(i); } Then I write the end result to a new buffer, then read that buffer again in a modifying writer and do final modifications on pages. The amount of time is increased a lot by doing it in two steps rather than one.

So I really would appreciate a solutions around this so I and other can benefit from. I mean rearranging must be a common thing?

peter-borgstedt commented 6 years ago

I tried this:

const writeStream = new streams.WritableStream(); const writer = hummus.createWriterToModify(new PDFRStreamForBuffer(content.buffer), new hummus.PDFStreamForResponse(writeStream));

const parser = writer.getModifiedFileParser(); const trailer = parser.getTrailer().toPDFDictionary().toJSObject(); const pageArr = trailer.ID.toPDFArray().toJSArray();

console.log(pageArr); pageArr.forEach(function (element, index, array) { console.log(element.toString()); });

I get an array of PDFHexString using trailer.ID:

[ PDFHexString { value: '�^P��\u000eU�\u0004~Iw+ϞF' }, PDFHexString { value: '�^P��\u000eU�\u0004~Iw+ϞF' } ]

I can use getLength and queryObject(inIndex) to iterate through the PDFArray, but how do I create a new PDFArray and then convert it back to an PDFObject(?) and override the ID array in the dictionary? Is this the correct array?

And are the strings references to the pages? I'm lost.

tlrjs commented 5 years ago

I'm lost trying to do this as well