Closed ggrossetie closed 4 years ago
Hello @Mogztter! I dug into this and discovered the following:
English
as the language for a PDF in Adobe Acrobat DC results in the addition of a /Lang (en)
entry in the PDF's catalog, as described here. And, of course, this is exactly what you are doing with the line you shared in the description.English
, and changed a couple other fields in the "Advanced" Document Properties tab. Here is the file I created: en_lang_test.pdfen_lang_test.pdf
in Adobe Acrobat DC (paid version): en_lang_test.pdf
in Adobe Acrobat Reader DC (free version): All of this leads me to believe that the inability to view a document's language metadata in Adobe Acrobat Reader DC may be a bug in Acrobat Reader itself. Or perhaps its intended, for some strange reason. But it does not appear to have anything to do with the PDF document itself. Because the issue persists even when viewing the language for a document created by Adobe Acrobat itself, without using any third party libraries.
I hope this helps. If you dig into this any further and discover why Acrobat Reader doesn't render the language field, I'd be interested to know what you find!
Thank you really much for digging into this! Maybe it's a paid feature 😉
But it does not appear to have anything to do with the PDF document itself. Because the issue persists even when viewing the language for a document created by Adobe Acrobat itself, without using any third party libraries.
I'm reassured 👍
One last thing, do you think we should add a tiny function to set the language on the document? Similar to setTitle
, setAuthor
, setSubject
... on the PDFDocument
.
Sure. I'd be willing to accept a PR for a PDFDocument.setLanguage
method.
Hey! This issue was reported to me again, and a user was able to provide a PDF where the language displays correctly and is properly recognized: accessible-pdf-example.pdf
When I create a PDF using the following code:
import fs from 'node:fs'
import { PDFDocument, PDFString, rgb, PDFName } from 'pdf-lib'
// Create a new PDFDocument
const pdfDoc = await PDFDocument.create()
pdfDoc.catalog.set(PDFName.of('Lang'), PDFString.of('fr'))
// Serialize the PDFDocument to bytes (a Uint8Array)
const pdfBytes = await pdfDoc.save()
fs.writeFileSync('out.pdf', pdfBytes)
I cannot find the /Lang
entry (in plain-text) when I open out.pdf
in a text editor. Whereas in accessible-pdf-example.pdf
I can see the following line:
482 0 obj
<</Lang(en)/MarkInfo<</Marked true/Suspects false>>/Metadata 6 0 R/Names 530 0 R/Outlines 12 0 R/PageLabels 230 0 R/Pages 232 0 R/StructTreeRoot 21 0 R/Type/Catalog/ViewerPreferences 531 0 R>>
endobj
Maybe the /Lang
must not be encoded and written in plain-text to maximize compatibility across PDF reader? Does it make sense?
@Hopding I can open a new issue if needed.
According to the specification it's possible to define the
Lang
in the PDF catalog: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdfI'm using the following code:
When using
exiftool
, I can see that the language is present:But when using Acrobat Reader, the value is empty:
Am I doing something wrong?