PhilterPaper / Perl-PDF-Builder

Extended version of the popular PDF::API2 Perl-based PDF library for creating, reading, and modifying PDF documents
https://www.catskilltech.com/FreeSW/product/PDF%2DBuilder/title/PDF%3A%3ABuilder/freeSW_full
Other
6 stars 7 forks source link

CTS 33 - Don't know what to do with alpha layer in TIFF #130

Closed carygravel closed 3 years ago

carygravel commented 3 years ago

In the attached zip file are a TIFF with alpha layer, created by imagemagick with:

convert -depth 1 -gravity center -pointsize 78 -size 1000x568 caption:'Lorem ipsum etc etc' alpha.tif

and a PDF created by the following Perl code (also in the zip file as alpha.pl

The PDF is corrupt, and the Perl code produce the warning: Don't know what to do with alpha layer in TIFF

This is tested with Graphics::TIFF.

#!/usr/bin/perl
use warnings;
use strict;
use PDF::Builder;

my $width = 1000;
my $height = 568;
my $pdf = PDF::Builder->new(-file => 'alpha.pdf');
my $page = $pdf->page();
$page->mediabox($width, $height);
my $gfx = $page->gfx();
my $img = $pdf->image_tiff('alpha.tif');
$gfx->image($img, 0, 0, $width, $height);
$pdf->save();
$pdf->end();
carygravel commented 3 years ago

Without Graphics::TIFF, the warning becomes Your system does not have Graphics::TIFF installed, so some TIFF functions may not run correctly. and the PDF is blank.

PhilterPaper commented 3 years ago

Thank you for the report. I am able to replicate the problem and hope to have time to look at it very soon. I presume that the second PDF you sent was from alpha.pl without Graphics::TIFF. It is reported as corrupt when attempting to open it.

When I look at alpha.tif in GIMP, all I see is an empty transparent canvas, so it may not be handling it correctly. Windows Photo Viewer, Photos, Paint, and Paint 3D; as well as AutoDesk Sketchbook report that it's damaged or an unsupported format, I'll have to find a TIFF viewer for Windows, to make sure the original alpha.tif is uncorrupted.

I need to do some research on TIFF and alpha channels, but a quick look suggests that TIFF does support an alpha channel. Now to see if libtiff and Graphics::TIFF officially support alpha. By any chance can you quote chapter and verse that says it's supposed to support an alpha channel?

carygravel commented 3 years ago

http://www.libtiff.org/libtiff.html, section "TIFFRGBAImage Support". A stands for Alpha.

carygravel commented 3 years ago

ImageMagick displays alpha.tif without problem.

carygravel commented 3 years ago

However, tiff2pdf, which is a utility which is part of libtiff, produces the same corrupt image as in the original report.

PhilterPaper commented 3 years ago

It looks like I may have to install the Windows version of ImageMagick, in order to get a good TIFF viewer. Recall that tiff2pdf was mentioned in the original skip message for Test 9 -- I guess that matters haven't improved in many years. Perhaps TIFF + Alpha channel is so unusual that most viewers don't bother supporting it?

A quick look at the TIFF_GT module in PDF::Builder suggests that it's aware of Gray+Alpha and RGBA, but the author never went very far in writing for it. I need to find some documentation for libtiff/Graphics::TIFF that explains why bitsPerSample only has a value of 1 (just black & white image?) and how it relates to SamplesPerPixel. Hopefully it will be in the libtiff link above. Once I can get the Alpha layer separated out cleanly, I should be able to borrow some code from the PNG module to handle it properly for the PDF.

carygravel commented 3 years ago

I've had gscan2pdf users complain about TIFF+Alpha not working, but I agree that it's not very common.

PhilterPaper commented 3 years ago

I'm making progress on this thing. It now produces a valid PDF that almost shows correctly -- black and white are flipped, and the text characters are "stuttering" but recognizable. I'm separating out the alpha channel and currently just discarding it (image is fully opaque). I still need to figure out what to do with "associated" alpha data, where the image data is pre-multiplied (scaled). If you can, I'd appreciate your playing with it with various TIFF files (with and without alpha) and see if it looks close, and any suggestions on how to fix it. [obsolete TIFF_GT replacement removed]

PhilterPaper commented 3 years ago

Some more information: the raw data for alpha.tif was x1 bit pairs, where x was 0 over most of the image. Therefore, I'm assuming that it's GA format, with white=0 and transparent=0.

  1. I had to flip the alpha and buffer outputs from split_alpha(), which I really don't like to do (as I don't understand why I had to do that). I'm concerned that something like RGBA might be corrupted. Does this have something to do with Fill Order or some other flag?
  2. The TIFF flags BlackIsZero (1) and WhiteIsZero (0) are consistent, although I would think they're reversed if the background is supposed to be white, and is usually a 0 bit.
  3. The "stuttering" effect (almost two copies of the text overlapping and XOR'd together) is strange. Both the height and the width are even multiples of 8, so I don't think we're running into end-of-strip boundary alignment effects. This is a puzzler.
  4. Currently I'm just discarding the alpha channel, until the image display is correct. There will be a -notrans option to discard alpha channels.
  5. I don't know how (and if) the mask layer can be worked into this.
  6. I'm not sure what to do with pre-multiplied graphics data (associated alpha). It sounds like the graphics (image) has been already scaled down for quick adding to the existing underlayment image. If so, data has been lost. E.g., if a fully transparent pixel, it would be 0 and the original pixel value is unrecoverable.
PhilterPaper commented 3 years ago

Re item 6, I don't think you'll be able to recover the original pixel if fully transparent (i.e., the opacity value is 0 and the pre-multiplied pixel is now 0), but it could just be left as 0, and everything else divided by the opacity value (if > 0), after converting to floating point. Within the limits of rounding to integer value, this should produce an image somewhat close to the original pixel values (except for any fully transparent pixels, which will just be left as 0). Thus PDF will be happy with the unscaled pixel values.

PhilterPaper commented 3 years ago

More progress. The sample image now displays correctly.

  1. fixed
  2. flipped the flags, so it now displays black on white like ImageMagick shows, but why is 'blackIsZero' == 1?
  3. fixed, although not verified on a .tif with height and/or width not a multiple of 8 (for bi-level)
  4. still discarding alpha until confirm that all images are handled correctly (sample file all alpha are 1 = opaque)
  5. no further work on this yet
  6. will need some real non-bi-level .tif images, with real alpha channel (not just full opaque/transparent) to check this

'fillOrder' on this .tif is 1... I'll have to look and see if fillOrder and planarConfiguration have any effect. What I'm worried about is either a sample order that's not GA or RGBA, or bit orders within samples are reversed. Also, if a strip does not fill a whole number of pixels (i.e., there's some bits left over in the strip's last byte), I fear that it will get out of sync with the data by not discarding that last bit of junk -- or does Graphics::TIFF hide this from me somehow?

If I read the TIFF spec correctly, it is possible to have different sized (bits per sample) samples, and not necessarily 2^N size, e.g. 5 bit Red, 8 bit Green, 2 bit Blue, and 4 bit Alpha. However, Graphics::TIFF seems to return all samples the same size -- is this an error, or are they resampled to all be the same size and possibly all 2^N size?

At some point, with some real-world experience under my belt with this, it may turn out that the pure Perl code to split out the image and alpha from the buffer (as well as un-pre-multiplying the image) is too slow, an we'll have to approach the Graphics::TIFF author about adding an XS code (C) splitter, as was done in Image::PNG::Libpng for PNG support.

Cary, I can't go much further without someone with a selection of real .tif files of various types seeing if they work with the revised TIFF_GT.pm. As your request for TIFF+alpha support is driving this work, I'd like to ask for your help on this. TIFF_GT.pm.txt

carygravel commented 3 years ago

Congratulations on your progress.

Here's an example I picked up some time ago of a 24bit + alpha TIFF

The current release release of PDF::Builder produces a PDF with an unrecognisable image.

Graphics::TIFF passes on exactly what it gets from libtiff.

If you'd like some extra code in Graphics::TIFF to do some heavy lifting, I can certainly look at it.

24bitalpha.zip

My suggestion would be, assuming that the output for this image is reasonable, that you cut a release with it and see what feedback you get. The important thing is to have fixed the initial problem, and to have some tests in place covering the fixes so that if any bugs come in you don't get any regressions.

It occurred to me that another way of regression testing stuff like this without having masses of PDFs hanging around against which to do bitwise comparisons might be to keep the images tiny, force an ASCII filter (or similar) for the output and do the comparisons as strings.

PhilterPaper commented 3 years ago

I got your 24bit "pag1.tif" working (it's a scanned page of French text, right?). I went ahead and pushed the changes to GitHub, as that seems to be a good milestone. The TIFF_GT.pm on GitHub supersedes the previous TIFF_GT.pm.txt attachment. I'm going to try to get alpha working with TIFF before I put out another CPAN release.

I've been looking all over the place for other examples of TIFF+alpha, for testing, but I haven't found anything yet. There were several from epictor.com, but they turned out to be using some other method for transparency (possibly the clipping mask), not the associated or unassociated alpha data.

Still to be done:

  1. make use of the alpha transparency in building a PDF
  2. un-pre-multiplying associated data (I have nothing to test with yet)
  3. consider supporting transparency mask feature
  4. check whether libtiff ever delivers data in other orders (than GA or RGBA) or in reverse bit order (again, need samples) or with different-length samples
  5. consider whether adding t-tests is worthwhile
  6. see how performance is and whether an XS assist to split_alpha() is needed
PhilterPaper commented 3 years ago
  1. done
  2. done (but have nothing to test against)

I will release this as currently stands, in 3.021. This ticket will be closed, and any further requests should get a new ticket.