eugmes / fntsample

PDF and PostScript font samples generator (migrating from https://sourceforge.net/projects/fntsample/)
GNU General Public License v3.0
23 stars 4 forks source link

pdfoutline: document & workaround a PDF::API2 outline corruption bug. #16

Closed biergaizi closed 3 years ago

biergaizi commented 3 years ago

This is not a pdfoutline bug, but a PDF::API2 bug, but due to the obscure nature of the issue, documentation and a code workaround is highly desirable, so others will not waste an hour on debugging like I did. Thus I'm submitting this Pull Request.

Due to a bug in PDF::API2 [0], if the UTF-16 bytesteam of a Unicode character happens to contain byte 0x0A (ASCII newline), for example the CJK character U+4E0A (上), the subsequent Unicode text is garbled because the byte is escaped incorrectly. This creates a hard-to-detect PDF outline corruption problem that may caught users in surprise, since the vast majority of outline strings will appear normal - only "lucky" strings will get corrupted.

All PDF::API2 versions before v2.040 are affected by the bug.

Meanwhile, since PDF::API2 v2.034 [1], Unicode character in outlines are handled automatically without the need of doing manual encoding, and this codepath doesn't contain the aforementioned bug.

Thus, we perfer to use the new codepath if PDF::API2 v2.034+ is detected. If an earlier version is detected, we also issue a warning to the user. Finally, this commit also documents this problem in the man page for user reference.

[0] https://rt.cpan.org/Public/Bug/Display.html?id=134957

[1] https://rt.cpan.org/Public/Bug/Display.html?id=33497

Steps to Reproduce

  1. Use PDF-API2 v2.039 or an older version.
  2. Create a test test.pdf file.
  3. Create an outline.txt that contains the character 网上冲浪. 0 1 网上冲浪
  4. Run pdfoutline test.pdf outline.txt output.pdf
  5. Open output.pdf in a reader, garbled characters 网乑뉭樀 is seen instead.
eugmes commented 3 years ago

@biergaizi: Merged, thanks for the contribution.