PhilterPaper / Perl-PDF-Builder

Extended version of the popular PDF::API2 Perl-based PDF library for creating, reading, and modifying PDF documents
https://www.catskilltech.com/FreeSW/product/PDF%2DBuilder/title/PDF%3A%3ABuilder/freeSW_full
Other
6 stars 7 forks source link

[CTS 11] Not passing validation #83

Open PhilterPaper opened 6 years ago

PhilterPaper commented 6 years ago

Even simple PDF files produced by PDF::Builder fail validation tests such as http://www.pdf-online.com's. Unfortunately, the error messages are rather vague ("incorrect meta data" and "not PDF 1.4 compliant", etc.) and not much use in seeing what's missing or wrong. Does anyone know of a good validator that gives useful information? Then I can see what needs to be added or changed (such as inserting a default media size entry). Are there any flags or settings on Acrobat Reader to output good diagnostics? Apparently, it tries to fix up a PDF file when it reads it in. Does the PDF 1.7 spec clearly spell out anywhere what's needed? I seem to recall seeing some "mandatory" or "required" information here and there, but it's scattered.

pdf-online also claims to check PDF/A compliance (see #52 and #76), but we really need to get the PDF standards compliance fixed first, before tackling PDF/A.

Note that pdf-online, and possibly other validators, take your PDF file to their server. Watch out that you don't have any sensitive information in your PDF that you are testing!

With regards to PDF validation, this might be used in testing PDF::Builder code, as well as example programs. At a minimum, we need validation that what the library is putting out is a clean PDF. The main burden of making sure nothing is missing (e.g., media size) may still fall upon the user of the PDF::Builder library, but at least the library could help out a bit more than it does.

PhilterPaper commented 6 years ago

It looks like, among other things, that MediaBox is required for a page (according to the 1.7 spec). It looks like we'll have to put in a default page size (e.g., Universal) and if nothing is explicitly given by the time output starts, to use the default in Page output. We might also simply call MediaBox() at the beginning, and make sure that if it's later given explicitly, that the new setting overrides the current one, rather than adding another (conflicting) setting.

Regarding other required things, it looks like I need to search the PDF spec for "required", and deal with all of those. I suspect that most Required things are already in, but there also may be things added in post-1.4 versions. If any are found, they will have to wait until input and output version control is implemented. Default settings versus explicitly-given ones can be handled in the same manner as MediaBox.

PhilterPaper commented 5 years ago

Just a note that

  1. There is now a default Media Box entry of US Letter size (as the PDF default is supposed to be Letter). A global (or page) override may be given, if desired, through the mediabox() call.
  2. In 3.014 and 3.015 releases, some tolerance was added for malformed PDF header (i.e., run-on comment) and some problems with the xref table. This should let PDF::Builder digest PDFs that most Readers will silently fix up and carry on with. However, the resulting PDF may still trigger warnings or errors in a validator.
  3. It is possible that even if all Required items are implemented for PDF 1.4, there may be additional Required items for higher levels (1.5+). As the PDF level may be bumped up part-way through the generation process, this might be difficult to deal with.
PhilterPaper commented 11 months ago

See also #199.