galkahana / HummusJS

Node.js module for high performance creation, modification and parsing of PDF files and streams
http://www.pdfhummus.com
Other
1.14k stars 169 forks source link

How to get current transformation context while generating a pdf? #487

Open blairjanis opened 5 months ago

blairjanis commented 5 months ago

I'm using a combination of pdfjs-dist and hummus to create pdfs in a node.js application. Rather than render the pdf to a canvas, which is the common use case for pdfjs-dist, I am getting all of the operations using pdfjs-dist and then generating a new pdf using some of those operations in hummus. Part of the reason for doing this is so that I can get information about each element (text, line, rectangle, etc.) on the pdf such as x,y coordinates, scale, rotation, etc. For example, I may have the following operations (question follows the examples below):

fn: 31 (beginText)
fn: 40 (moveText) args: [10, 10]
fn: 44 (showText) args: "Something I want printed on the pdf"
fn: 40 (moveText) args: [0, 20]
fn: 44 (showText) args: "The next line of text"
fn: 32 (endText)

To accomplish this in hummus I'm using the drawing operators directly. So in this case it would look something like this (I'm using memory-streams):

var outStream = new streams.WritableStream();
var pdfWriter = hummus.createWriter(new hummus.PDFStreamForResponse(outStream);
var pdfPage = pdfWriter.createPage(0, 0, 612, 792);
var context = pdfWriter.startPageContentContext(pdfPage);
context.q();
context.BT();
context.Td(10, 10);
context.Tj('Something I want printed on the pdf')
context.Td(0, 20);
context.Tj('The next line of text');
context.ET()
context.Q()
pdfWrite.writePage(pdfPage);
pdfWriter.end();
outStream.end();
fs.writeFileSync('filename.pdf', outStream.toBuffer());

Question: Is there some place in hummus that is tracking the x, y coordinates and other transformation information through this process that I can expose to my code?

Here is what I know:

I have written code to loop through the ops and track the x, y, scale and everything else going on, and most of the time my code works. But it is really tedious and does not work in all cases.

So I'm trying to figure out where this information is being tracked and see if I can get that information. My understanding from the pdf reference is that this is called the current transformation matrix (CTM). Any guidance is appreciated.

galkahana commented 5 months ago

nothing in hummus tracks transformation matrixes or test movements. it's not really aware of the semantic meaning of most of the commands for the context. only important exception is the collection of glyphs from text commands for the sake of automating the later embedding of font information (namely - those glyphs). it's otherwise very nimble and focuses only on info it requires for what tasks it carries out itself.

this is something, however, that you can track yourself (i guess this goes without saying). if you intend to do that hummus might be able to help with those elements that you don't yourself control, like getting measurements of texts and images.

image measurements: https://github.com/galkahana/HummusJS/wiki/Show-images#getting-image-measurements

text measurements: https://github.com/galkahana/HummusJS/wiki/Show-text#measuring-text

blairjanis commented 5 months ago

Thanks @galkahana. Sure appreciate the information and the quick response