PhilterPaper / Perl-PDF-Builder

Extended version of the popular PDF::API2 Perl-based PDF library for creating, reading, and modifying PDF documents
https://www.catskilltech.com/FreeSW/product/PDF%2DBuilder/title/PDF%3A%3ABuilder/freeSW_full
Other
6 stars 7 forks source link

Underline thickness and position ignoring font metrics #215

Closed PhilterPaper closed 5 months ago

PhilterPaper commented 5 months ago

In ssimms/pdfapi2/issues/79, @sciurius reported:

In PDF/API2/Resource/CIDFont/TrueType/FontFile.pm, font metrics are scaled to 1000/upem. However, underlineposition and underlinethickness are not scaled.

PDF/API2/Content.pm, lines 2043-2044 seem to assume 1000, not upem:

 my $underlineposition = (-$self->{' font'}->underlineposition() * $self->{' fontsize'} / 1000 || 1);
 my $underlinethickness = ($self->{' font'}->underlinethickness() * $self->{' fontsize'} / 1000 || 1);

This affects all fonts with a design size not equal to 1000, which includes all modern TTF/OTF fonts that have a design size 2048.

The result can be seen in the attached image: left is LibreOffice, right is PDF::API2. The font is DejaVuSerif.

issue79

It is probably too late to change but it should at least be documented somewhere.

PhilterPaper commented 5 months ago

This is probably in PDF::Builder, too. Also see #192 for more discussion on underlining.

If thickness and position are explicitly given in "underline" settings or CSS (for column()), those should override anything given in the font metrics. However, for default use, the font's specified underline properties should be used, if present.

sciurius commented 5 months ago

Interesting that you label a fix to something that is wrong as enhancement.

PhilterPaper commented 5 months ago

I regard it as a minor glitch and not something actually broken. The results are perfectly serviceable -- to me it would be an enhancement (rather than a fix) to tune the underlining to what the font designer wanted.

PhilterPaper commented 5 months ago

I'm pretty busy right now, but if someone wants to issue a PR to make the fix/enhancement, I'll be happy to consider it. I should think that API2's fix should be very similar. You might even be able to do one for API2, and I can use the changes to put into Builder (or vice-versa).

PhilterPaper commented 5 months ago

Just a note that per https://www.w3schools.com/cssref/pr_text_text-decoration.php, the CSS property text-decoration is shorthand for the properties

Interestingly, this does not seem to specify the position of the underline (relative to the baseline). That is left to the text-underline-offset property. There are many other CSS properties for modifying text, that might be implemented.

Note that the text() method includes settings for thickness, position, and color of underline(s), and should therefore override any underlining metrics given in the font. column() text should obey any CSS given first, and use font metrics as the fallback.

PhilterPaper commented 5 months ago

OK, I have text() auto underlining and strike-through fixed now. I'm still looking at what column() underline/strike-through needs, as well as the analogous HarfBuzz::Shaper-using routines. I see that TTF kerning (at least for H-S) appears to use 1000 instead of 2048... does that have the same scaling problem?

Any other place you think you may have seen 1000 used instead of upem? Should I be suspicious of any font-related use of a '1000' scaling value, possibly to be replaced by upem?

PhilterPaper commented 5 months ago

I've been looking through the code in PDF::Builder, and there are a number of places where 1000 is used in text-related code without explanation. If the glyph coordinate system is UPEM x UPEM (N.B. glyphs may extend outside this box, so long as they do not exceed +/- 16383), one would think that everything coordinate-related, including underlines (addressed above), kerning values, character widths, ascenders and descenders, and possibly other measurements would be divided by UPEM to get 1em, rather than a hard-coded 1000. Yet, other than the underline problem that Johan noted, I haven't seen any text-rendering issues that cry out, "use UPEM/UPM instead of 1000!" If someone used a TTF/OTF font with UPEM=2048, something should appear glaringly wrong -- text twice the size it should be, etc.? Maybe I've seen (and overlooked) very minor problems with UPEM=1024, but have never encountered a UPEM=2048 (or larger) font "in the wild"?

There are a few places where both 1000 and UPEM appear in the code, e.g., $width * 1000/ $upem. I'm not sure what to make of that.

@sciurius and @terefang, do you have any thoughts on why we're not seeing obvious problems dividing by 1000 in character widths, kerning, etc., but only in underline thickness and placement? Anyone else with knowledge in this area is welcome to chime in.

sciurius commented 5 months ago

See https://github.com/ssimms/pdfapi2/issues/79 . Most metrics are scaled to upem. Some (like underline thickness) are not.

PhilterPaper commented 5 months ago

Well, if you're comfortable that the 'underline' code was the only one in error, I'm good with that. There are a lot of places with the term * 1000/ UPEM, which it is possible, means they are already scaling to UPEM and then multiplying by another 1000 (to be divided out later). If it works, it works. This (dividing by 1000 and multiplying by font size) includes glyph width, ascenders and descenders, and kerning values.

By the way, I have updated the column() underline code in the same manner as text(). It doesn't look like I have to do anything with the HarfBuzz::Shaper-support code in Content.pm, which uses /1000 * font size in a few places, but doesn't do underlining.

sciurius commented 5 months ago

I haven't seen any problems so far with HarfBuzz::Shaper.

terefang commented 5 months ago

@sciurius and @terefang, do you have any thoughts on why we're not seeing obvious problems dividing by 1000 in character widths, kerning, etc., but only in underline thickness and placement? Anyone else with knowledge in this area is welcome to chime in.

at the time i wrote the code, i wasnt sure if the values given were postscript specd (ie. scale agnostic) or not.

i also never use underlining personally ... IMO ... it is a relict of the past ... a holdover from the time of physical typewriters.

terefang commented 5 months ago

If someone used a TTF/OTF font with UPEM=2048, something should appear glaringly wrong -- text twice the size it should be, etc.? Maybe I've seen (and overlooked) very minor problems with UPEM=1024, but have never encountered a UPEM=2048 (or larger) font "in the wild"?

There are a few places where both 1000 and UPEM appear in the code, e.g., $width * 1000/ $upem. I'm not sure what to make of that.

back in the day (90s) there were a bunch of broken font tools and even more borked fonts because of this.

i remember on particular bad windows tool that allowed not only to convert postscript to truetype, but also create artificial styles out of baseline fonts, much similar to multi-master fonts today.

while it did this job reasonable well for the glyph-paths, it butchered around in the data-structures using a amalgamation of defaults, copy from the source fonts and calculated values.

i did a check on my "font museum" for what could be possible out there:

i have seen the following unitsPerEm: 18, 750, 780, 800, 830, 870, 1000, 1024, 1040, 1124, 1240, 1250, 1440, 2000, 2048, 2218, 2816, 4096, 8160

many can be examined for debugging from google-fonts.

$width * 1000/ $upem that is correct – the width of the glyphs are given in design units so need to be scaled to postscript units as expected by pdf.

PhilterPaper commented 5 months ago

i also never use underlining personally ... IMO ... it is a relict of the past ... a holdover from the time of physical typewriters.

Hey, you never know what will come back into style! People today spend good money on vinyl LPs and tube (valve) amplifiers. I would hope that physical typewriters don't make a comeback, but am prepared to be astounded. I could imagine a computer application that makes a "clacking" sound of keys hitting the platten with each keystroke -- if you think that's absurd, that's exactly what IBM did with their early model 3270 green-screen displays. It was done to reassure secretaries used to Selectric typewriters that they were doing something when they pressed a key. At least, they included a key to turn off the noise.

I wouldn't be surprised if some publishers and other entities (e.g., courts?) require underlining in their style guides.

Anyway, thanks for the historical notes. I figured the * 1000 / UPEM probably had something to do with converting to a "standard" UPEM=1000. I'll go ahead and close this ticket, as apparently only underlining was affected, and I think I have that fixed now.