PhilterPaper / Perl-PDF-Builder

Extended version of the popular PDF::API2 Perl-based PDF library for creating, reading, and modifying PDF documents
https://www.catskilltech.com/FreeSW/product/PDF%2DBuilder/title/PDF%3A%3ABuilder/freeSW_full
Other
6 stars 7 forks source link

Don't convert LZW to Flate #148

Closed carygravel closed 3 years ago

carygravel commented 3 years ago

No need to convert LZW to Flate, as PDF can handle it, and there are cases where it is better.

PhilterPaper commented 3 years ago

It appears that "convert" failed to build "test.tif", so one or more of the tests in t/tiff.t failed. If you happen to be in there changing code, it would probably be a good idea to change $tiff (for the temporary test.tif file) to $temp_tiff, so as to avoid linters etc. complaining that the $tiff object is being reused.

PhilterPaper commented 3 years ago

I presume you need to sync up your modified t/tiff.t with my GitHub version, and then everyone/everything will be happy.

If I understand your changes, it's to stop decoding LZW compression and re-encoding as Flate, as PDF can handle LZW on its own? Any idea why the code would have originally been written that way? LZW decode seems to have been around from (or near) The Beginning. All I can think of is that someone was concerned about the patent on LZW. Is that patent still in effect anywhere? I would have thought that it had long ago expired worldwide (July, 2004, by what I can see).

I already took care of distinguishing between the two $tiff variables ($tiff_t for the second) so that linters, etc. won't be so unhappy about the reuse.

carygravel commented 3 years ago

I presume you need to sync up your modified t/tiff.t with my GitHub version, and then everyone/everything will be happy.

Yup, but no time today.

If I understand your changes, it's to stop decoding LZW compression and re-encoding as Flate, as PDF can handle LZW on its own? Any idea why the code would have originally been written that way? LZW decode seems to have been around from (or near) The Beginning. All I can think of is that someone was concerned about the patent on LZW. Is that patent still in effect anywhere? I would have thought that it had long ago expired worldwide (July, 2004, by what I can see).

I've no idea why Alfred recoded LZW to Flate. Flate is better in many situations. Perhaps he thought it was always better. Or maybe you are right about the licence - but as you say it expired years ago, and is definitely no problem any more.

PhilterPaper commented 3 years ago

Flate is better in many situations. Perhaps he thought it was always better.

I wonder if it would be better to leave the LZW-to-Flate code in place, with an option to use it or native LZW (default to one or the other). That way, if performance is poor with LZW, the user could choose to switch to Flate (or vice-versa). Do you think there could be enough difference between the two to warrant this? The code for Flate is already written and tested; if it might be useful, perhaps we should keep it around.

carygravel commented 3 years ago

I wonder if it would be better to leave the LZW-to-Flate code in place, with an option to use it or native LZW (default to one or the other). That way, if performance is poor with LZW, the user could choose to switch to Flate (or vice-versa). Do you think there could be enough difference between the two to warrant this? The code for Flate is already written and tested; if it might be useful, perhaps we should keep it around.

For me, the only reason to use TIFF is to get access to those image compression algorithms not available via PNG/JPEG, etc.

I start with everything as PNG. If I need Group4 or LZW, I convert to TIFF with the appropriate compression.

You don't know beforehand whether Flate or LZW will be better. You've simply got to try. If I want Flate, I'll attach the image as PNG. If I want LZW, then as TIFF.

You say that the LZW is written and tested, but one of my motivations for writing GT was that the deLZW code didn't work reliably. Now for those corner cases where the pure Perl code corrupted the image, the result should be fine, because the image is no longer changed.

PhilterPaper commented 3 years ago

Ah, if LZW-to-Flate is buggy, that would be a different matter. I take it that it's not an easy fix? Is it bad enough to justify losing the ability to deLZW?

carygravel commented 3 years ago

Ah, if LZW-to-Flate is buggy, that would be a different matter. I take it that it's not an easy fix? Is it bad enough to justify losing the ability to deLZW?

Before GT existed, I spent a considerable amount of time trying to fix the bugs. I did succeed in a couple of places, but was unable to find solutions for all the problems.

My suggestion would be to bring the functionality back at the point at which the encoding functionality is introduced.

PhilterPaper commented 3 years ago

I tried your changes, and ran into a problem. First, the TIFF_GT (using Graphics::TIFF) seems to work OK -- all 12 tests claim to run correctly. But TIFF (NOT using Graphics::TIFF) failed test 11 (NOT convert LZW to Flate). ImageMagick dumps each pixel of the image (quite lengthy). Did you test with either -nouseGT=>1 or with Graphics::TIFF removed? Both behave the same way (fails) for me.

I see in your change to TIFF.pm that you basically pulled out all the code that converted the LZW to Flate. Do you need to add or change the "filter" setting now to tell it it's LZW and not Flate?

carygravel commented 3 years ago

Good catch. Thanks. I've pushed an extra test without GT and fixed the code to make it run.