galkahana / HummusJS

Node.js module for high performance creation, modification and parsing of PDF files and streams
http://www.pdfhummus.com
Other
1.15k stars 170 forks source link

How to get current transformation context while generating a pdf? #487

Open blairjanis opened 10 months ago

blairjanis commented 10 months ago

I'm using a combination of pdfjs-dist and hummus to create pdfs in a node.js application. Rather than render the pdf to a canvas, which is the common use case for pdfjs-dist, I am getting all of the operations using pdfjs-dist and then generating a new pdf using some of those operations in hummus. Part of the reason for doing this is so that I can get information about each element (text, line, rectangle, etc.) on the pdf such as x,y coordinates, scale, rotation, etc. For example, I may have the following operations (question follows the examples below):

fn: 31 (beginText)
fn: 40 (moveText) args: [10, 10]
fn: 44 (showText) args: "Something I want printed on the pdf"
fn: 40 (moveText) args: [0, 20]
fn: 44 (showText) args: "The next line of text"
fn: 32 (endText)

To accomplish this in hummus I'm using the drawing operators directly. So in this case it would look something like this (I'm using memory-streams):

var outStream = new streams.WritableStream();
var pdfWriter = hummus.createWriter(new hummus.PDFStreamForResponse(outStream);
var pdfPage = pdfWriter.createPage(0, 0, 612, 792);
var context = pdfWriter.startPageContentContext(pdfPage);
context.q();
context.BT();
context.Td(10, 10);
context.Tj('Something I want printed on the pdf')
context.Td(0, 20);
context.Tj('The next line of text');
context.ET()
context.Q()
pdfWrite.writePage(pdfPage);
pdfWriter.end();
outStream.end();
fs.writeFileSync('filename.pdf', outStream.toBuffer());

Question: Is there some place in hummus that is tracking the x, y coordinates and other transformation information through this process that I can expose to my code?

Here is what I know:

I have written code to loop through the ops and track the x, y, scale and everything else going on, and most of the time my code works. But it is really tedious and does not work in all cases.

So I'm trying to figure out where this information is being tracked and see if I can get that information. My understanding from the pdf reference is that this is called the current transformation matrix (CTM). Any guidance is appreciated.

galkahana commented 10 months ago

nothing in hummus tracks transformation matrixes or test movements. it's not really aware of the semantic meaning of most of the commands for the context. only important exception is the collection of glyphs from text commands for the sake of automating the later embedding of font information (namely - those glyphs). it's otherwise very nimble and focuses only on info it requires for what tasks it carries out itself.

this is something, however, that you can track yourself (i guess this goes without saying). if you intend to do that hummus might be able to help with those elements that you don't yourself control, like getting measurements of texts and images.

image measurements: https://github.com/galkahana/HummusJS/wiki/Show-images#getting-image-measurements

text measurements: https://github.com/galkahana/HummusJS/wiki/Show-text#measuring-text

blairjanis commented 10 months ago

Thanks @galkahana. Sure appreciate the information and the quick response