PhilterPaper / Perl-PDF-Builder

Extended version of the popular PDF::API2 Perl-based PDF library for creating, reading, and modifying PDF documents
https://www.catskilltech.com/FreeSW/product/PDF%2DBuilder/title/PDF%3A%3ABuilder/freeSW_full
Other
6 stars 7 forks source link

PDFs failing on some readers #149

Open PhilterPaper opened 3 years ago

PhilterPaper commented 3 years ago

This is split off from #141, as that issue should be restricted to the black/white color inversion on bilevel TIFFs. I think I have that one fixed now, although I'm not fully comfortable with when to invert the colors.

Per the previous issue, some TIFFs produce PDFs that cause some Readers to choke. For example, @carygravel supplied a G4 bilevel TIFF that creates a PDF. evince (Linux), Firefox (Windows and maybe Linux), and XpdfReader (Windows) all read this PDF just fine, but Adobe Acrobat Reader DC (Windows) fails to display the image part, giving a message about insufficient image data. So far I have not been able to track this down. I see that one that worked (PDF::Builder not using Graphics::TIFF library, I think) had raster data that was actually 8 bytes shorter than the failing one, plus the last 4 bytes (in common) were different. Cary swears that libtiff should not be doing anything to the raster data, but did raise the question whether Windows (what I'm using) CRLF line-ends could produce a different result than on Linux (NL line-ends). I'm wondering whether Adobe is expecting an EOFB marker and failing to find it (thus the "short" raster data, but why only on this image?), while other Readers either ignore the marker or silently work around it. Anyway, AR is the only one that seems to fail -- other Readers are happy to properly display the page, and don't report any errors.

Now this problem has its own issue, and hopefully I'll be able to fix it at some point.

PhilterPaper commented 3 years ago

I've also tried my test suite on the non-Graphics::TIFF (old) code, with odd results. Some, such as G4.tiff, aren't supported anyway, but some give strange errors such as "package "1" does not support "val()" call" when used in one order, while in another order of conversion they give inverted images. I need to look into this some more to find a rhyme and reason behind what's failing. Perhaps some minor fixes can be made to the non-GT code to at least better support some cases (e.g., fix the color inversion).

PhilterPaper commented 3 years ago

I think I've got the non-GT issue straightened out. I've also fixed some inverted color bilevels on non-GT (just pushed to GitHub). Here's how it now stands with my collection of test files:

Note that a number of test suite TIFF files with unsupported formats (alpha layer, G4 compressed bilevel) are omitted from the non-Graphics::TIFF test. No promises on the non-GT problems... if the fix looks fairly easy, I'll go ahead and do it, but otherwise it's just better to use Graphics::TIFF.

PhilterPaper commented 3 years ago

There's been mention of "JBIG2" here and there -- it appears to be another bilevel compression method (along with "JBIG") that is incompatible with other methods (and PDF), if it snuck in somewhere. It would be good to find out the signature to check if G4 and perhaps some other troublemakers are in fact JBIG2-compressed. Maybe non-Adobe readers are able to handle it?

By the way, JBIG2, although it offers great compression, sounds somewhat dangerous. According to the Wikipedia article, it can substitute similar looking graphics blocks (such as "6" for "8"), resulting in a radically incorrect image!

carygravel commented 3 years ago

TIFF does not support JBIG2, as far as I know.

But PDF does support JBIG2, but not JBIG, I think. As you say, there can be issues with the compression and thus some authorities, particularly in the EU, do not allow PDFs with JBIG2 compression for archives.

PhilterPaper commented 6 months ago

Some image problems may simply be a case of a given filter (compression method) not being supported (or incorrectly implemented) in a given PDF Reader. In that case, nothing can be done with the code, but to document what sorts of things appear to fail in what Readers.

I have also recently fixed a few cases that turned out to be errors in PDF::Builder, not the Reader (e.g., putting 'save' and 'restore' codes (q and Q) in a text stream; mishandling of a saved dashed line structure).