PhilterPaper / Perl-PDF-Builder

Extended version of the popular PDF::API2 Perl-based PDF library for creating, reading, and modifying PDF documents
https://www.catskilltech.com/FreeSW/product/PDF%2DBuilder/title/PDF%3A%3ABuilder/freeSW_full
Other
6 stars 7 forks source link

CTS 27 - Problem with font metrics (regression?) #117

Closed sciurius closed 4 years ago

sciurius commented 4 years ago

I'm getting reports that PDF::Builder tests of Text::Layout started to fail. See e.g. http://www.cpantesters.org/cpan/report/5c484370-8fa4-11ea-92a5-4f3975b42a2f.

Looking at the output of prove -lv t/150-pdfbuilder.t it seems that all metrics have been rounded to integer values.

PhilterPaper commented 4 years ago

Is this the same 0.018.1 that's been out for a couple months now, or is this some test version I don't see on CPAN or GitHub? Is it suddenly failing on Perl 5.31, which is pretty bleeding edge? Is this only on one OS (e.g., Windows) or is it happening on multiple OS's? Can you tell if the problem started with PDF::Builder 3.018, or could it have been happening with 3.017?

I did have to adjust one of my t-tests because of a very slight change in corefont metrics (I updated the character widths tables to match the font files, and added a whole bunch of new glyphs that were missing). However, I have not seen any place where metrics have been rounded down (or truncated) to integer. It's possible that Windows is doing something like that in some versions/updates, but I haven't seen it myself. I certainly did not knowingly change any code to make metrics integer.

I do recall a discussion on the wkHTMLtoPDF product (on GitHub) about differences between Linux and Windows rendering, where one was rounding to integer and the other was leaving glyph stroke coordinates in reals.

PhilterPaper commented 4 years ago

OK, I think I've narrowed it down. 150-pdfbuilder.t works fine with PDF::Builder 3.018, but 152-pdfbuilder.t gives lots of errors:

1..27
ok 1 - Font desc
ok 2 - baseline -13.66
ok 3 - baseline -13.66
not ok 4 - pixel_size width 166
#   Failed test 'pixel_size width 166'
#   at t\152-pdfbuilder.t line 52.
ok 5 - pixel_size height 18
not ok 6 - pixel_size width 166
#   Failed test 'pixel_size width 166'
#   at t\152-pdfbuilder.t line 55.
ok 7 - pixel_size height 18
not ok 8 - size width
#   Failed test 'size width'
#   at t\152-pdfbuilder.t line 60.
ok 9 - size height
not ok 10 - size width
#   Failed test 'size width'
#   at t\152-pdfbuilder.t line 63.
ok 11 - size height
ok 12 - pixel_extents ink x 0
ok 13 - pixel_extents ink y 0
not ok 14 - pixel_extents ink width 166
#   Failed test 'pixel_extents ink width 166'
#   at t\152-pdfbuilder.t line 75.
ok 15 - pixel_extents ink height 18
ok 16 - pixel_extents layout x 0
ok 17 - pixel_extents layout y 0
not ok 18 - pixel_extents layout width 166
#   Failed test 'pixel_extents layout width 166'
#   at t\152-pdfbuilder.t line 75.
ok 19 - pixel_extents layout height 18
ok 20 - extents ink x 0
ok 21 - extents ink y 0
not ok 22 - extents ink width 166
#   Failed test 'extents ink width 166'
#   at t\152-pdfbuilder.t line 88.
ok 23 - extents ink height 18
ok 24 - extents layout x 0
ok 25 - extents layout y 0
not ok 26 - extents layout width 166
#   Failed test 'extents layout width 166'
#   at t\152-pdfbuilder.t line 88.
ok 27 - extents layout height 18
# Looks like you failed 8 tests of 27.

I replaced the four times*.pm corefont metrics files with 3.017's, and all tests passed. Looking at the metrics files, a few widths have changed (e.g., 'C' from 667 to 666). This will slightly change the results you test against, but I can't see why it would be going to integer somewhere (I printed out the values it was working with, such as 166.08 which now is just 166).

PhilterPaper commented 4 years ago

I think it may be a bizarre coincidence that the new numbers are coming out integers. I took the first string, "The quick brows fox" and looked up the character widths in the Times-Roman metrics table for both 3.017 and 3.018. They add up to 8304 and 8300, respectively. Convert from glyph units to points by dividing by 1000 and multiplying by the font size (20), and you get 166.08pt and 166pt, which is what you're seeing. Maybe play with some other strings and see what you get.

As this is a separate test from PDF::API2, you could simply update the t-test to use the revised widths. Is there any overriding reason to go back to the old character widths? Is any Reader going to use the widths from the PDF file (and thus from the metrics file), rather than the actual font file's widths? The widths are in the PDF file, but I don't know which is being used by the Reader (I presume that the font file has to be read anyway to get the glyph strokes and curves -- I don't think a corefont gets embedded). And finally, are the widths supplied in Windows TTF font files slightly different from the "standard" metrics? In other words, are the metrics a little different between Windows, Linux, and Mac?

sciurius commented 4 years ago

I updated the character widths tables to match the font files

As you clearly indicate in the the second paragraph of your last comment there is no 'gold' standard core font associated with these metrics, and that it doesn't matter since noone who is seriously in typesetting will use the built-in metrics. So why change in the first place?

I'd suggest reverting this change.

PhilterPaper commented 4 years ago

I did have a good reason to change in the first place. I needed to pick up the widths of a bunch of new glyphs, in order to fix the "missing Cyrillic characters" bug (RT 57248). So, I wrote a utility to grab the widths from a TTF file, and used it to update the metrics files. Unfortunately, a handful of existing widths changed by a tiny amount (e.g., 667 -> 666). To preserve those existing widths, I would have to go through the 24 changed metrics files, compare them by eye, and manually update them. That's a lot of work that I'm hoping to avoid.

Now the corefonts should match what you would get for glyph widths used in ttfonts. I don't know about the rest of the metrics, such as bounding boxes. Whether the Windows fonts match the fonts found in Linux and Mac, I also don't know. If there is indeed no "gold standard" for these things, an arbitrary decision will have to be made. At this point, lacking further information on what fonts have what metrics, it appears to come down to either 1) reverting PDF::Builder's font metrics (old glyphs that changed) to what they were before (even though they would not match TTF behavior), or 2) updating a handful of empirically-determined string widths in one t-test of Text::Layout. Are there any other data points to consider?

sciurius commented 4 years ago

The official Adobe font files do not have anything beyond Latin-1 (except for euro and zcaron). The general advice for PDF::API2 is to use real font files for anything else. So there is no need to add Cyrillic characters.

I don't follow the 'need' to match the metrics with ttfonts. When using ttfonts you use the ttfonts metrics, not the corefonts builtin stub. Also: what ttfonts?

I have no intention to change the Text::Layout test since the test is not wrong. It just reveals that PDF::Builder made an unfortunate and erroneous mistake.

PhilterPaper commented 4 years ago

If it's possible to specify a single-byte encoding that permits non-Latin-1 characters, I don't find it unreasonable to support this. Just because you don't use such characters doesn't mean that others don't have a legitimate use for them. I agree that it would probably make more sense for most to use TTF rather than Core Fonts, but that's water under the bridge.

I have been reviewing the situation, to see what my options are here. Many characters seem to differ by only 1 glyph unit (1/1000 em), which is not visually significant, but is measurably different for string widths (and thus the 152-pdfbuilder.t failure). I was hoping to be able to pass a flag of some sort to update selected widths (wx) on the fly, or else support an old and a new set of widths, but both appear to be quite messy (I'm open to suggestions). Of more concern is that a few characters differ in width by as much as 100 glyph units, which works out to 2pt on a 20pt font size! That's visually quite significant, and I need to examine them on a case-by-case basis.

As for the majority of characters (glyphs) which differ by only one unit, since there's no telling which width is the "gold standard" and should be used here (and whether core fonts on different platforms are even the same), I am considering rolling back the changed ones to the old wx values, per your demand (while keeping the new glyphs, and deciding what to do about the way-off ones). Is there any PDF documentation officially stating what the expected width values should be, so I can compare them against the CoreFont/ metrics files? Are the .afm files (14 base fonts) pointed to by https://www.adobe.com/devnet/font.html considered canonical? If so, are they usable in a practical sense for any Reader on any platform?

PDF::Builder made an unfortunate and erroneous mistake.

I'll pretend that I didn't see that.

sciurius commented 4 years ago

Are the .afm files (14 base fonts) pointed to by https://www.adobe.com/devnet/font.html considered canonical?

I think this is as official as you can get.

Now, assuming these metrics match the actual corefonts in some PDF viewers, then adding metrics for glyphs (e.g. cyrillic) may result in PDF documents produced by PDF::Builder that can not be viewed correctly by these viewers since they use real corefonts. So I still think it is better to avoid confusion (by sticking to the core fonts and glyphs, for everything else use real fonts) than to hope noone will notice.

I have yet to see a real PDF document that uses corefonts and glyphs other than the ones in the official metrics files.

PhilterPaper commented 4 years ago

I see your point about possibility of incompatibility, but still, the original PDF::API2 and these official font metrics included many non-Latin-1 glyphs. I would have to document what characters (glyphs) are available (those in the canonical metrics files). What might be better is to keep the extended metrics, but include a warning (in the POD) that there is no guarantee that a document recipient on the other end will have a local font file on their platform that includes all the glyphs used. If they wish to absolutely guarantee that the recipient can display all the glyphs, they should consider using TTF instead (and embed the font).

PhilterPaper commented 4 years ago

I have gone through the 14 "core" font metrics files and updated them to use AFM "canonical" character widths if they're within +/- 1 unit of the TTF widths released with 3.018. There are a handful of glyphs more than 1 off, and I have chosen what appears to be an appropriate width (AFM, TTF, or something else) for those glyphs.

Text::Layout t/152-pdfbuilder.t now works correctly, and PDF::Builder t/text.t has been returned to its original width values.