This is not a pdfoutline bug, but a PDF::API2 bug, but due to the obscure nature of the issue, documentation and a code workaround is highly desirable, so others will not waste an hour on debugging like I did. Thus I'm submitting this Pull Request.
Due to a bug in PDF::API2 [0], if the UTF-16 bytesteam of a Unicode character happens to contain byte 0x0A (ASCII newline), for example the CJK character U+4E0A (上), the subsequent Unicode text is garbled because the byte is escaped incorrectly. This creates a hard-to-detect PDF outline corruption problem that may caught users in surprise, since the vast majority of outline strings will appear normal - only "lucky" strings will get corrupted.
All PDF::API2 versions before v2.040 are affected by the bug.
Meanwhile, since PDF::API2 v2.034 [1], Unicode character in outlines are handled automatically without the need of doing manual encoding, and this codepath doesn't contain the aforementioned bug.
Thus, we perfer to use the new codepath if PDF::API2 v2.034+ is detected. If an earlier version is detected, we also issue a warning to the user. Finally, this commit also documents this problem in the man page for user reference.
This is not a pdfoutline bug, but a PDF::API2 bug, but due to the obscure nature of the issue, documentation and a code workaround is highly desirable, so others will not waste an hour on debugging like I did. Thus I'm submitting this Pull Request.
Due to a bug in PDF::API2 [0], if the UTF-16 bytesteam of a Unicode character happens to contain byte 0x0A (ASCII newline), for example the CJK character U+4E0A (上), the subsequent Unicode text is garbled because the byte is escaped incorrectly. This creates a hard-to-detect PDF outline corruption problem that may caught users in surprise, since the vast majority of outline strings will appear normal - only "lucky" strings will get corrupted.
All PDF::API2 versions before v2.040 are affected by the bug.
Meanwhile, since PDF::API2 v2.034 [1], Unicode character in outlines are handled automatically without the need of doing manual encoding, and this codepath doesn't contain the aforementioned bug.
Thus, we perfer to use the new codepath if PDF::API2 v2.034+ is detected. If an earlier version is detected, we also issue a warning to the user. Finally, this commit also documents this problem in the man page for user reference.
[0] https://rt.cpan.org/Public/Bug/Display.html?id=134957
[1] https://rt.cpan.org/Public/Bug/Display.html?id=33497
Steps to Reproduce
test.pdf
file.outline.txt
that contains the character网上冲浪
. 0 1 网上冲浪pdfoutline test.pdf outline.txt output.pdf
output.pdf
in a reader, garbled characters网乑뉭樀
is seen instead.