foliojs / pdfkit

A JavaScript PDF generation library for Node and the browser
http://pdfkit.org/
MIT License
9.87k stars 1.15k forks source link

Tagged PDF - suggestions for integration #1147

Open hamscher opened 4 years ago

hamscher commented 4 years ago

Question

I needed to generate PDF that included Structure Elements ("Tagged PDF") and related Accessibility features. I modified pdfkit to permit this for documents with text and no images.

But I am new to Git and have never contributed to Github.
So, I am looking for friendly advice on how to make it available.
Should I simply start a branch of this project in Github and make modifications there?
Should I give it a new version number starting from the version I based it on? At what point is it sensible to make a pull request into the master, since this is a new feature? Can my branch have its own bundle js file so that people can just refer to it there?

Description

My implementation permits the options argument passed in to _fragment an object with keys for a tag and an MCID that makes the text a marked content sequence with that tag. There are new classes PDFStructTreeRoot, PDFStructTreeElem, and NumTree. The Catalog now includes Lang and MarkInfo entries.

Although limited, it is sufficient to produce pdf that passes all of the Accessibility checks in PDF/UA (ISO 14289).

Code sample

const { info } = doc;
info.Title = 'Sample file title';
const { structTreeRoot } = doc;
var mcid = -1; // track the last MCID marked
var elementStack = [structTreeRoot]; // stack of tags for nesting

// Option "plist" passed down to _fragment, contains a dictionary with the tag and tag properties:
doc.text("Hombres",doc.x,doc.y,{plist: {tag: 'H1', properties: {MCID: ++mcid}}};

// Make a structure element child of the current top of stack:
var newElement = elementStack[0].addTag('H1',mcid,{T: 'men'})

// If this structure element can have children, push it onto the stack:
elementStack.unshift(newElement);
avioli commented 4 years ago

For a newbie - use Github Desktop - it will handle your login credentials correctly, but feel free to connect however you want, so you can push to your own repositories.

I assume you either (1) cloned this repo, (2) downloaded its source files as an archive or (3) forked it and finally edited that.

I don't want to assume what knowledge you've got, so some of these might be too basic for you and you may have already done them:

Good luck

PS1. If you want to add a bundle to your own fork, then I would suggest you create a new branch (say feature-tagged-pdf) for all the above changes and create a Pull Request for that branch, not your master. Then you can alter your fork's master branch however you want. Keep in mind that any changes you push to the same branch automatically will be added to the Pull Request, so if you plan to add other features that are not related to the PR a separate branch is your best choice.

PS2. If people want to use the features you've added they can fork either this repo and merge your changes or clone/fork yours. Up to them.

blikblum commented 4 years ago

Should I simply start a branch of this project in Github and make modifications there?

Yes

Should I give it a new version number starting from the version I based it on?

No

At what point is it sensible to make a pull request into the master, since this is a new feature?

At any time

Can my branch have its own bundle js file so that people can just refer to it there?

No. Not possible at all due to how github works (used to work with now dead rawgit)

hamscher commented 4 years ago

THANK YOU!

insightfuls commented 4 years ago

@hamscher my apologies, I actually didn't see this issue you raised until just now. In the last week or so I've made my own implementation of accessibility support and raised a PR. If I'd seen this, I would have reached out to see if I could leverage what you had already done in this space. Here's the comment I wrote on issue #1062, which I did find:

I've raised a pull request (#1177) to add accessibility support. Do feel free to have a look at the PR, or directly at the documentation, demonstration or generated accessible PDF.

Perhaps you'd like to take a look at my implementation and see what you think and offer any suggestions?

hamscher commented 4 years ago

Yes, thanks, I will take a look.

On Sep 30, 2020, at 6:30 AM, Ben Schmidt notifications@github.com wrote:

@hamscher https://github.com/hamscher my apologies, I actually didn't see this issue you raised until just now. In the last week or so I've made my own implementation of accessibility support and raised a PR. If I'd seen this, I would have reached out to see if I could leverage what you had already done in this space. Here's the comment I wrote on issue #1062 https://github.com/foliojs/pdfkit/issues/1062, which I did find:

I've raised a pull request (#1177 https://github.com/foliojs/pdfkit/pull/1177) to add accessibility support. Do feel free to have a look at the PR, or directly at the documentation https://github.com/foliojs/pdfkit/blob/6d598741c53ac9f454c079c1ce13d2595b31a877/docs/accessibility.md, demonstration https://github.com/foliojs/pdfkit/blob/6d598741c53ac9f454c079c1ce13d2595b31a877/demo/accessibility.js or generated accessible PDF https://github.com/foliojs/pdfkit/blob/6d598741c53ac9f454c079c1ce13d2595b31a877/demo/accessible.pdf.

Perhaps you'd like to take a look at my implementation and see what you think and offer any suggestions?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/foliojs/pdfkit/issues/1147#issuecomment-701305862, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIX5ORBZCWS3HE724PJPVLSIMCE3ANCNFSM4PSRT25Q.

A leader is someone you follow, if only out of curiosity - Colin Powell

hamscher commented 3 years ago

It’s very nice! Clearly, it is much better thought out, documented and well integrated into pdfkit as a whole, than I could possibly manage, so, good on you for that. I had a very specific use case in mind and wasn’t concerned about images, for example, or general markup, but very very concerned about how to have the pdfmake (https://github.com/bpampuch/pdfmake https://github.com/bpampuch/pdfmake) package (which handles auto table layouts and so forth) produce html table tagging and have whatever that code produces pass the acrobat accessibility checks. I notice that the generated accessible PDF https://github.com/foliojs/pdfkit/blob/6d598741c53ac9f454c079c1ce13d2595b31a877/demo/accessible.pdf does not seem to pass that test so I think that would be an exercise for me to try to use your package instead of mine and try to get the same result.

I am fairly new to GitHub, so I am kind of confused as to how I would either clone or fork your code, since it seems not to be a branch, but only a single commit at https://github.com/foliojs/pdfkit/tree/6d598741c53ac9f454c079c1ce13d2595b31a877 https://github.com/foliojs/pdfkit/tree/6d598741c53ac9f454c079c1ce13d2595b31a877 — other than just downloading a zip, how do I use it and track changes? I assume you are continuing to work on it.

On Sep 30, 2020, at 1:18 PM, Walter Hamscher walter@hamscher.com wrote:

Yes, thanks, I will take a look.

On Sep 30, 2020, at 6:30 AM, Ben Schmidt <notifications@github.com mailto:notifications@github.com> wrote:

@hamscher https://github.com/hamscher my apologies, I actually didn't see this issue you raised until just now. In the last week or so I've made my own implementation of accessibility support and raised a PR. If I'd seen this, I would have reached out to see if I could leverage what you had already done in this space. Here's the comment I wrote on issue #1062 https://github.com/foliojs/pdfkit/issues/1062, which I did find:

I've raised a pull request (#1177 https://github.com/foliojs/pdfkit/pull/1177) to add accessibility support. Do feel free to have a look at the PR, or directly at the documentation https://github.com/foliojs/pdfkit/blob/6d598741c53ac9f454c079c1ce13d2595b31a877/docs/accessibility.md, demonstration https://github.com/foliojs/pdfkit/blob/6d598741c53ac9f454c079c1ce13d2595b31a877/demo/accessibility.js or generated accessible PDF https://github.com/foliojs/pdfkit/blob/6d598741c53ac9f454c079c1ce13d2595b31a877/demo/accessible.pdf.

Perhaps you'd like to take a look at my implementation and see what you think and offer any suggestions?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/foliojs/pdfkit/issues/1147#issuecomment-701305862, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIX5ORBZCWS3HE724PJPVLSIMCE3ANCNFSM4PSRT25Q.

A leader is someone you follow, if only out of curiosity - Colin Powell

insightfuls commented 3 years ago

@hamscher Sorry I didn't see this earlier. I have been busy with a newborn daughter.

My code is on a branch in my fork: https://github.com/insightfuls/pdfkit/tree/pdf-ua (you can see a link to my branch at the top of the PR #1177).

It troubles me that the example isn't passing accessibility checks, as I expected that it would (I think). I don't have access to the full version of Acrobat, but if you do and could provide more details on the failure(s), I'd be more than happy to fix it up, as I would like the example to pass all the checks, and if that requires some changes to the implementation to achieve that, I'd be happy to adjust it. Of course, you're welcome to work on fixing it up yourself, too, which would be a great contribution.

insightfuls commented 3 years ago

P.S. @hamscher I too am very interested in seeing if/when PDFMake is able to be extended to leverage the new features I've built into PDFKit. I think it's entirely possible that PDFMake could be extended to generate accessible PDFs without the end user needing to do anything differently (or very little), which would be a big win. It's something I might look into some day if nobody else takes it up, as a personal project.

hamscher commented 3 years ago

I did…. see GitHub.com/sec-gov/pdfmake and GitHub.com/sec-gov/pdfkit As a newcomer to both javascript and GitHub, having only just in November gotten authorization to create the (free) sec-gov account, I'm proceeding slowly. I”d be grateful for any way you would suggest to reduce the overall number of forks, branches, etc.

On Jan 3, 2021, at 11:10 PM, Ben Schmidt notifications@github.com wrote:

P.S. @hamscher https://github.com/hamscher I too am very interested in seeing if/when PDFMake is able to be extended to leverage the new features I've built into PDFKit. I think it's entirely possible that PDFMake could be extended to generate accessible PDFs without the end user needing to do anything differently (or very little), which would be a big win. It's something I might look into some day if nobody else takes it up, as a personal project.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/foliojs/pdfkit/issues/1147#issuecomment-753746172, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIX5OVYNFMXTUNMTNJCQNDSYE5VBANCNFSM4PSRT25Q.

Worst%20Idea%20Ever%38%20Spaces%20in%20Pathnames

hamscher commented 3 years ago

PS Congratulations on the newborn!

On Jan 4, 2021, at 10:11 AM, Walter Hamscher walter@hamscher.com wrote:

I did…. see GitHub.com/sec-gov/pdfmake http://github.com/sec-gov/pdfmake and GitHub.com/sec-gov/pdfkit http://github.com/sec-gov/pdfkit As a newcomer to both javascript and GitHub, having only just in November gotten authorization to create the (free) sec-gov account, I'm proceeding slowly. I”d be grateful for any way you would suggest to reduce the overall number of forks, branches, etc.

On Jan 3, 2021, at 11:10 PM, Ben Schmidt <notifications@github.com mailto:notifications@github.com> wrote:

P.S. @hamscher https://github.com/hamscher I too am very interested in seeing if/when PDFMake is able to be extended to leverage the new features I've built into PDFKit. I think it's entirely possible that PDFMake could be extended to generate accessible PDFs without the end user needing to do anything differently (or very little), which would be a big win. It's something I might look into some day if nobody else takes it up, as a personal project.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/foliojs/pdfkit/issues/1147#issuecomment-753746172, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIX5OVYNFMXTUNMTNJCQNDSYE5VBANCNFSM4PSRT25Q.

Worst%20Idea%20Ever%38%20Spaces%20in%20Pathnames

— Don’t worry what people say behind your back. They are behind you for a reason.

insightfuls commented 3 years ago

@hamscher I've just updated my branch (and thus PR) with a bug fix which I found by looking deeper into the demo which you said was failing accessibility checks, by finding a free online validator. Hopefully it fails your validation less badly now. I fully expect it still to fail, as the link annotation is not properly referenced (my changes do not support that, though I may extend them some time soon).