PhilterPaper / Perl-PDF-Builder

Extended version of the popular PDF::API2 Perl-based PDF library for creating, reading, and modifying PDF documents
https://www.catskilltech.com/FreeSW/product/PDF%2DBuilder/title/PDF%3A%3ABuilder/freeSW_full
Other
6 stars 7 forks source link

Problems with LZW and TIFF #151

Closed PhilterPaper closed 3 years ago

PhilterPaper commented 3 years ago

@carygravel Looks like I spoke too soon. Although your latest changes for not converting LZW to Flate (#148) passes all the t-tests, I ran my TIFF test suite and it fails for all LZW-compressed bilevel and RGB images (Acrobat Reader: "insufficient data for an image", black on Firefox and XpdfReader). It appears to be OK for grayscale images.

Did you test your new LZW code on a selection of LZW-compressed images, and they displayed OK? If so, I might have messed up something while pulling in your changes. Can you compare your local code versus what's on GitHub for the various TIFF-related files?

By the way, I compared the GitHub copy of TIFF_GT.pm with your PR copy and my running local copy, and (except for $VERSION and one or two minor cosmetic changes), they appear to be identical. Any chance you changed something in File_GT.pm and didn't pull it into the PR? That's assuming that your PR copy is based on the older version and not the current GitHub version. Finally, I replaced the current TIFF_GT.pm with the version from 3 or so days ago (before the LZW/Flate changes) and my test suite worked fine. Note that with both the old and new code, all 13 t-tests passed! Should the LZW (not cconvert to Flate) have failed?

PhilterPaper commented 3 years ago

I see the following two new lines in TIFF_GT.pm:

$decode->{'EndOfLine'} = PDFBool(1);
$decode->{'EncodedByteAlign'} = PDFBool(1);

Could they be at fault? I haven't yet tried running without them. Both sound like they could cause the Reader to run out of data early.

Update: I commented out those two lines and re-ran the test suite. No difference.

carygravel commented 3 years ago

Can you post an image that doesn't work for you?

What do you mean by

reran the test suite

?

PhilterPaper commented 3 years ago

For example, the file LZW.tiff that you sent me (bilevel, LZW compression, lsb, black 0). I have a collection of 19 TIFF files that I can run with or without Graphics::TIFF (have to exclude 7 for non-GT). That is my test suite. It worked fine up until the latest round of changes (to not convert LZW to Flate). I will zip it and try to send it to you, along with all my test files (you have most of them already).

Add: by any chance did you change File.pm and File_GT.pm and forget to put them in the PR?

carygravel commented 3 years ago

Is there some specific code you run to test all the images, or do you do it by hand?

carygravel commented 3 years ago

I have created a test for this we can run on the fly. The problem seems to occur when the TIFF has more than one strip. Now I'll work on fixing the problem.

PhilterPaper commented 3 years ago

Is there some specific code you run to test all the images, or do you do it by hand?

I sent you "tiff.pl" which runs my whole test suite in one PDF. It's not an automated test (CI or t-test), so I guess the answer is "by hand" (whenever TIFF-related work is done).

I'm glad to hear that you think you may have found the problem (multiple-strip LZW-compressed TIFFs).

PhilterPaper commented 3 years ago

@carygravel I was hoping to get PDF::Builder 3.022 out by the end of this month (March). I've given this date to several people awaiting bug fixes. If you think you might have the TIFF LZW issue straightened out soon, I can delay that into early April. On the other hand, if the end is not yet in sight, I could back out the latest LZW changes and just go with the rest of the code (including the other TIFF changes). Then there would be no pressure on you for quickly fixing the LZW issue. Please let me know your thoughts on this.

carygravel commented 3 years ago

The problem is caused by the fact TIFF compresses on a per-strip basis, and PDF has no concept of strips.

Therefore, the solution is to allow libtiff to decrypt each strip, and then to recompress the whole image. The fun part is that TIFF and PDF have chosen a similar but incompatible format to that used by the standard tools, including the only Perl implementation I have found, Compress::LZW.

Therefore, I have spent the last couple of evenings with the PDF reference working up a LZW outfilt routine to go in PDF/Builder/Basic/PDF/Filter/LZWDecode.pm

I've made good progress and have basic tests working, but am still in the process of ironing out some bugs. If you want to release immediately, back out the recent LZW commits. I'll do my best to have a fix soon, but obviously can't promise anything.

Another workaround might be to force libtiff to recompress to an intermediate image with a high enough rows per strip count force only one strip.

PhilterPaper commented 3 years ago

Wow, sounds like a mess! Why don't I check in around 3/29 and see where you are. If it looks like no more than a few days to a week until all is well, I'll release a few days after you're done. If at that point you don't know how long it will take, I'll release 3.022 without the LZW code. Of course, your changes will be welcome in 3.023 in June or July. Just please don't rush and cut corners to get your new code out in the next few days. When it's ready, it's ready.

PhilterPaper commented 3 years ago

Just a note to see PR #152 regarding this work. Also, I am going to revert the recent LZW changes so I can get 3.022 out the door.

PhilterPaper commented 3 years ago

Just a note that the 3.022 release went out with the t/tiff.t changes for the LZW tests (probably should have backed them out too, in hindsight, but they didn't give any error). One of the CPANTS testers is reporting a failure (across several Linux versions) where the image is dumped out pixel-by-pixel. I'll have to look for the time I was seeing that error -- it might be a backlevel libtiff and/or Graphics::TIFF? I'm not sure how to screen for that at this point (possibly run Graphics::TIFF::getVersion or whatever it was, that gives the libtiff version, and either just skip the test, or state that GT isn't installed). Maybe wait until the latest LZW drama is settled one way or the other, but at least keep this error in the back of your mind.

carygravel commented 3 years ago

I saw the three smoker failures and had a look. They are running an ancient Debian release, and I am guessing the failures are due to the version of imagemagick or ghostscript.

In order to fix them, we'd first have to find a way of reproducing them, i.e. find an appropriate VM or docker image.

I think we probably both have better things to do than fix failures in repos that were EOL at the end of 2019.

PhilterPaper commented 3 years ago

If we can test levels of ImageMagick and ghostScript at the top of t/tiff.t (after establishing that they are installed), that could be helpful in not running those tests on backlevel systems. Something to think about.

https://gist.github.com/zulhfreelancer/e104e4b330c8dbc6622d92b8cd525bf6: ImageMagick convert -version says 7.0.10-49 Q16 x64 2020-12-14 on my installation.

https://tex.stackexchange.com/questions/49538/ps2pdf-checking-version-on-a-windows-machine: rungs -v says 9.25 (2018-09-13) on my installation.

Do you think that would be workable if we can decide upon minimum versions? There's no great harm in requiring 7.* and 9.25 respectively, if it prevents ugly install errors. So what if a few back-level installations can't run the full TIFF suite? I should be able to parse the version checks in Perl.

carygravel commented 3 years ago

I have no problem with minimum versions, but the imagemagick version ought to be quite a bit older than that, especially as you seem to want to support Perl 5.20. I suggest at least what the CI in github is using by default - Ubuntu Bionic, I think, which means imagemagick 6.9.7.4 and ghostscript 9.26. The latter is newer than what you have.

PhilterPaper commented 3 years ago

OK, how about 6.9.7.* for ImageMagick? Is that late enough that it should work properly, but early enough that Perl 5.20 systems should have it? I'm open to going with an earlier version, if you're sure it will still work. BTW, I'll be going up to minimum Perl 5.22 this summer (PDF::Builder 3.023 or 3.024).

One thing that concerns me is that both your new tests for LZW not being converted to Flate is that they both pass in t/tiff.t. If the new LZW code isn't in PDF::Builder, would you expect them to pass anyway? Under what circumstances would you expect them to fail?

carygravel commented 3 years ago

OK, how about 6.9.7.* for ImageMagick? Is that late enough that it should work properly, but early enough that Perl 5.20 systems should have it? I'm open to going with an earlier version, if you're sure it will still work. BTW, I'll be going up to minimum Perl 5.22 this summer (PDF::Builder 3.023 or 3.024).

6.9.7 sounds good.

One thing that concerns me is that both your new tests for LZW not being converted to Flate is that they both pass in t/tiff.t. If the new LZW code isn't in PDF::Builder, would you expect them to pass anyway? Under what circumstances would you expect them to fail?

The tests don't check for LZW in the PDF, just that the PDF renders to the same image as the input. I would therefore only expect them to fail if Builder couldn't decompress, transcode or find the correct parameters to render the image properly.

PhilterPaper commented 3 years ago

Here's my hack to check the version (convert 6.9.7+ and gs 9.25+). If a high-enough version is not found, it says that "convert not found" (SKIP message). Do you think it needs a more specific message? Please try it out on your Linux system, and anything else you have at hand. It works fine with Windows, but I can't test on other platforms. In particular, do all systems have "rungs"? tiff.t.txt (rename to t/tiff.t)

If convert and ghostscript have different version formats on other platforms, I'll have to revise it. If it can't figure out the format and extract a version number, it should fail gracefully and just skip those tests with "convert not found". Let me know if you think the error message (in the SKIP) should be more specific (i.e., say "convert version too old" or "can't parse convert version"). Is it safe to have different skip messages, or are systems trying to match specific strings?

You mentioned gs 9.26 as a minimum -- I can't use that, as what I have is 9.25 (was it just updated?). Do you think that gs 9.25 will have any problems with older converts (6.9.7)?

carygravel commented 3 years ago

Whatever gs exe you have should also take a -v argument. I don't think you need rungs. There is a core Perl module for comparing versions. Try this:

diff --git a/t/tiff.t b/t/tiff.t
index f4fe06b..3adce0c 100644
--- a/t/tiff.t
+++ b/t/tiff.t
@@ -6,6 +6,7 @@ use IPC::Cmd qw(can_run run);
 use File::Spec;
 use File::Temp;
 use Test::More tests => 13;
+use version;

 use PDF::Builder;
 # 0: allow use of Graphics::TIFF, 1: force non-GT usage
@@ -94,6 +95,18 @@ my $pngout = File::Spec->catfile($directory, 'out.png');
 # Note that GS installation MAY not permanently add GS to %Path% -- you
 #   may have to do this manually

+sub check_version {
+    my ($cmd, $arg, $regex, $min) = @_;
+    if (defined $cmd) {
+        if (`$cmd $arg` =~ m/$regex/) {
+            if (version->parse($1) >= version->parse($min)) {
+                return $cmd
+            }
+        }
+    }
+    return;
+}
+
 my ($convert, $gs);
 # Linux-like systems usually have a pre-installed "convert" utility, while
 # Windows must use ImageMagick. In addition, be careful NOT to run "convert"
@@ -103,6 +116,8 @@ if      (can_run("magick")) {
 } elsif ($OSNAME ne 'MSWin32' and can_run("convert")) {
     $convert = "convert";
 }
+$convert = check_version($convert, '-version', 'ImageMagick ([0-9.]+)', '6.9.7');
+
 # on Windows, ImageMagick can be 64-bit or 32-bit version, so try both. it's
 #   needed for some magick convert operations, and also standalone, and
 #   usually must be installed.
@@ -114,6 +129,7 @@ if      (can_run("gswin64c")) {
 } elsif (can_run("gs")) {
     $gs = "gs";
 }
+$gs = check_version($gs, '-v', 'Ghostscript ([0-9.]+)', '9.25.0');

 # alpha layer handling ------------------
 # convert and Graphics::TIFF needed
PhilterPaper commented 3 years ago

Thanks for the suggestions on streamlining the version checks using 'version' package. It's now in the repository.

PhilterPaper commented 3 years ago

PR 156 seems to have fixed the problems here, so closing it.