Closed mockdeep closed 1 month ago
I tried to create a reproducible example:
require 'hexapdf'
HexaPDF::Composer.create('gh332.pdf') do |c|
c.text("06\u00AD-220-\u00AD3010-\u00AD0-\u00AD1110-\u00AD1000-\u00AD2111")
end
This results in a PDF that looks like this in Okular:
Copying and pasting that string from Okular works as expected and hexapdf inspect
also shows this (see the text>
line):
$ hexapdf ins gh332.pdf psd 1
save_graphics_state
set_font_and_size /F1 10
begin_text
set_text_matrix 1 0 0 1 36 799.059764
text> 06-220-3010-0-1110-1000-2111
end_text
restore_graphics_state
Could you provide an example that shows the wrong behaviour?
@gettalong hmm, it looks like it may be a font issue. When we render it with the default font it does show the dashes, but when we render it with Arimo they disappear. I'm guessing that means this has nothing to do with HexaPDF. I'll leave it to you to close if you agree.
@mockdeep I will have a look at the font and will report back.
@mockdeep I have looked at the font and what HexaPDF does with it.
I will have to see how to handle this because it affects various parts of the font handling and font embedding code.
Holy cow, this rabbit hole just goes deeper and deeper. When I started to look into it I figured it was a missing glyph, then I circled around to string encoding issue, now it's a sort-of-missing glyph.
@mockdeep I have wrapped my head around this and have a potential solution. It works correctly for the case you brought up here but I still need to test the case of two codepoints mapped to the same glyph where both codepoints are normal characters (e.g. not a soft-hyphen, line break...).
@gettalong thanks so much! I really appreciate how responsive you've been on this stuff.
@mockdeep Could you please try out the devel branch which should fix your problem?
@gettalong sorry for the delay. I just got a chance to try out your branch and it works! The hyphens are appearing as expected.
@mockdeep Perfect! You can expect a release with the fix this weekend.
We have a user who copy/pasted some text from somewhere, and it has a mixture of soft hyphens and hard hyphens:
In the browser, this displays with just the normal hyphens:
But when we write it to a PDF with HexaPDF, it strips out all of the hyphens:
My understanding of soft-hyphens is very limited, but based on my reading it doesn't seem like it should also cause adjacent hard hyphens to be removed. Am I missing something?