galkahana / HummusJS

Node.js module for high performance creation, modification and parsing of PDF files and streams
http://www.pdfhummus.com
Other
1.14k stars 169 forks source link

Question: Is object ID of page currently being written available to 'OnPageWrite' listener? #360

Open shaehn opened 5 years ago

shaehn commented 5 years ago

I am trying to use the page write listener to capture page identity information, but so far have been unable to figure out if there is any current interface that will get me that ID. If this is not available through any of the current interfaces, could it be added to the page context argument passed in to the OnPageWrite listener?

galkahana commented 5 years ago

could pdfWriter.writePageAndReturnId() be useful to you?

for ref - https://github.com/galkahana/HummusJS/blob/8f153e0c52f846a79d1a04560a8293105431236a/src/PDFWriterDriver.cpp#L79

shaehn commented 5 years ago

I have used that function elsewhere for other reasons, but what I would like to do is capture the page object identifier in the event handler. That way it would not matter if I was using straight hummus, or hummus-recipe (I happen to be using both). Once writePage is called with either package, I would be able to get the object identifier of the page.

Does the context being sent into the OnPageWrite handler have any way of getting hold of a pdfReader on the file being written? I could then perhaps finagle a way to get the object ID that way.

shaehn commented 5 years ago

Ok, I tried accessing the file via createWriterToModify instead of createWriter so I could use the pdfReader it provides, but before I even went down the road of asking for the page object idea, I wanted to see what kind of file got created (I am simply creating a few empty pages) without any other changes. When I examined the file it looked corrupted. I was seeing doubles of xref, trailer, startxref %%EOF, Catalog entries and most of everything else I put in. This was totally unexpected on my part. Do I have to do something different when actually creating a file using the createWriterToModify, but not actually modify anything?

galkahana commented 5 years ago

RE the earlier question. youll have to fork & alter the code to pass you the object id. the object id for the page is determined prior to this phase so it should be easy to fetch it. but it does require a code change.

galkahana commented 5 years ago

a modified file is simply the original with appended definition of changes. that's how pdf is. what do you mean by "didnt actually modify anything" if you say earlier that you "simpley createing a few empty pages"? it exactly means that you did modify it. you appended pages to the end of it.

shaehn commented 5 years ago

Hmm... When I think of modification, as opposed to creation, I think of editing the newly created pages. However, I see your point of view on the word modification. So let me very specific with an example. Following is the original code using createWriter. I use nodejs to execute it.

const pdf = require( "hummus" );

function makeEmptyUnderlay(count) {
    const empty = 'empty.pdf';
    const pageCount = 1;
    //const pdfWriter = pdf.createWriterToModify(empty);
    const pdfWriter = pdf.createWriter(empty);
    const emptyPage = pdfWriter.createPage();
    const totalPages = pageCount*count;

    emptyPage.mediaBox = [0, 0, 612, 792];

    for (let page = 0; page < totalPages; page++) {
      pdfWriter.writePage(emptyPage);
    }

    pdfWriter.end();

    return empty;
}

const file = makeEmptyUnderlay(1);

This produces a very straight forward PDF document called empty.pdf

%PDF-1.4
%½¾¼
1 0 obj
<<
    /Type /Page
    /Parent 2 0 R
    /MediaBox [ 0 0 612 792 ]
    /Resources <<
    >>
>>
endobj
2 0 obj
<<
    /Type /Pages
    /Count 1
    /Kids [ 1 0 R ]
>>
endobj
3 0 obj
<<
    /Type /Catalog
    /Pages 2 0 R
>>
endobj
xref
0 4
0000000000 65535 f
0000000016 00000 n
0000000120 00000 n
0000000189 00000 n
trailer
<<
    /Size 4
    /Root 3 0 R
    /ID [ <3B6E1A7D82E5DD88848DAD11E6FB8203> <3B6E1A7D82E5DD88848DAD11E6FB8203> ]
>>
startxref
246
%%EOF

Now when I substitue createWriterToModify for the createWriter call, I get the following PDF

%PDF-1.4
%½¾¼
1 0 obj
<<
    /Type /Page
    /Parent 2 0 R
    /MediaBox [ 0 0 612 792 ]
    /Resources <<
    >>
>>
endobj
2 0 obj
<<
    /Type /Pages
    /Count 1
    /Kids [ 1 0 R ]
>>
endobj
3 0 obj
<<
    /Type /Catalog
    /Pages 2 0 R
>>
endobj
xref
0 4
0000000000 65535 f
0000000016 00000 n
0000000120 00000 n
0000000189 00000 n
trailer
<<
    /Size 4
    /Root 3 0 R
    /ID [ <3B6E1A7D82E5DD88848DAD11E6FB8203> <3B6E1A7D82E5DD88848DAD11E6FB8203> ]
>>
startxref
246
%%EOF
4 0 obj
<<
    /Type /Page
    /Parent 5 0 R
    /MediaBox [ 0 0 612 792 ]
    /Resources <<
    >>
>>
endobj
5 0 obj
<<
    /Type /Pages
    /Count 1
    /Kids [ 4 0 R ]
    /Parent 6 0 R
>>
endobj
2 0 obj
<<
    /Count 1
    /Kids [ 1 0 R ]
    /Type /Pages
    /Parent 6 0 R
>>
endobj
6 0 obj
<<
    /Type /Pages
    /Count 2
    /Kids [ 2 0 R 5 0 R ]
>>
endobj
7 0 obj
<<
    /Type /Catalog
    /Pages 6 0 R
>>
endobj
8 0 obj
<<
    /ModDate (D:20190227111011-05'00')
>>
endobj
xref
0 1
0000000000 65535 f
2 1
0000000670 00000 n
4 5
0000000481 00000 n
0000000585 00000 n
0000000755 00000 n
0000000830 00000 n
0000000887 00000 n
trailer
<<
    /Size 9
    /Prev 246
    /Root 7 0 R
    /Info 8 0 R
    /ID [ <3B6E1A7D82E5DD88848DAD11E6FB8203> <1BF394D8A6A7FEE9D29AB6B4D10834BE> ]
>>
startxref
949
%%EOF

which looks like it gave me two empty pages. I was thinking that it would just overwrite the original, but clearly that is not happening here. I did not realize that modifying a PDF was just adding to the end of the existing file, so I am learning something new here.

shaehn commented 5 years ago

If I create a repository fork to make the desired change, would this be something that you would be willing to pick up in a pull request to include in the main hummus baseline, or will it just become my private version?

I am unfamiliar with the build process for C++ code in node nmp package. Will the binding.gyp & package.json files handle the rebuilding of the PDFWriter source code, or do I have to do something else?