RunasSudo / gfx2gfx-pdftext

A fork of SWFTools' gfx2gfx which preserves text, rather than converting to shapes.
GNU General Public License v2.0
9 stars 5 forks source link

Miscellaneous issues with fonts #4

Open RunasSudo opened 7 years ago

RunasSudo commented 7 years ago

e.g. U+239f, which is erroneously placed at the left of the bounding box rather than the right; italic f, which is placed too close to the next letter.

e.g. U+239b, U+23aa

RunasSudo commented 7 years ago

There seems to be some kind of crazy per-document limit on the number of available glyphs??

Edit 1: wtf is going on

Edit 2: *table flip*

Edit 3: Well, all I can work out is that something weird is going on, and preventing one glyph from being PDF_encoding_set_char'd (even in different fonts!), or adding a superfluous PDF_encoding_set_char in some but not other locations, and pointing to some but not other Unicode values, allows missing glyphs to mysterious appear.

Replacing the font with another shows that PDFlib is attempting to output an unrelated character (like U+2022, U+201E (in two different contexts!), U+2026 or U+0192.

RunasSudo commented 7 years ago

There seems to be some sort of problem with large Unicode values.

RunasSudo commented 7 years ago

Hmm. Glyph 132 maps to Unicode 9115, the offending character. 132 corresponds with U+0084, which for some reason, displays in my browser as U+201E.

RunasSudo commented 7 years ago

It seems the problem is only with certain large Unicode values. The PUA seems to be unaffected.

vzhd1701 commented 7 years ago

This is related I believe. I've tried to convert a page with Unicode (Russian) text and failed to produce proper searchable pdf. Bits written in latin and some punctuation marks are good, but everything else become gibberish in clipboard. Test page attached. test_page.zip

RunasSudo commented 7 years ago

@Jason1122 Yes, that seems to be a consequence of this issue with Unicode support. Ideally, I'd really like to get this fixed, but I've had great difficulty working out how or even if it can be fixed.