PhilterPaper / Perl-PDF-Builder

Extended version of the popular PDF::API2 Perl-based PDF library for creating, reading, and modifying PDF documents
https://www.catskilltech.com/FreeSW/product/PDF%2DBuilder/title/PDF%3A%3ABuilder/freeSW_full
Other
6 stars 7 forks source link

t/font-type1.t uses URW Gothic Book Type1 font in PFB+PFM formats, while the font supplier provides them in T1+AFM formats #194

Closed ppisar closed 1 year ago

ppisar commented 1 year ago

PDF-Builder optionally tests handling Type1 fonts in t/font-type1.t. The test relies on PFB+PFM format of URW Gothic Book font. While packing PDF-Builder for EPEL 9 (RHEL 9) I experienced a backlash against relying on fonts in PFB format https://bugzilla.redhat.com/show_bug.cgi?id=2158422. The reason is that the fonts are now provided in T1+AFM format instead. A pretty good explanation is located at the beginning of https://src.fedoraproject.org/rpms/urw-base35-fonts/raw/rawhide/f/urw-base35-fonts.spec. I don't understand the formats. This is what "file" tool reports for them:

$ file /usr/share/X11/fonts/urw-fonts/a010013l.pfb 
/usr/share/X11/fonts/urw-fonts/a010013l.pfb: PostScript Type 1 font program data
$ file /usr/share/fonts/urw-base35/URWGothic-Book.t1 
/usr/share/fonts/urw-base35/URWGothic-Book.t1: PostScript Type 1 font text (URWGothic-Book 1.00)

I tried to change the test into use T1+AFM files ($pdf->psfont($t1_file, 'afmfile' => $afm_file) and this is the result:

$ perl -Ilib t/font-type1.t 
1..2
Use of uninitialized value $body in concatenation (.) or string at lib/PDF/Builder/Resource/Font/Postscript.pm line 153, <$inf> line 386.
Use of uninitialized value $tail in concatenation (.) or string at lib/PDF/Builder/Resource/Font/Postscript.pm line 153, <$inf> line 386.
ok 1 - Able to read a count of glyphs (>0) from a Type1 font
ok 2 - Font has the expected name

The warnings come from PDF::Builder::Resource::Font::Postscript::readPFAPFB().

My questions are: Is PDF::Builder::Resource::Font::Postscript capable of handling T1 format? Would you be willing to improve T1 support and enhance the test to use T1 instead / in addition to PFB?

apodtele commented 1 year ago

I did not mean to backlash. I've just said that Ghostscript has moved on to an updated version of the font. Perhaps, it is a good idea.

PhilterPaper commented 1 year ago

Hmm. This is getting confusing. In Ye Olden Tymes, a Type1 (Postscript) font expected either an ASCII font file (.pfa) or a binary font file (.pfb). It could be matched with either an ASCII font metrics file (.afm) or a binary font metrics file (.pfm). Then you had to give the full path to both. If I understand you correctly, there is now a ".t1" font file? I've never worked with that... is it identical to either .pfa or .pfb? If not, it's unlikely that psfont() will work with it, although if you can point me to some documentation and best practices, I'll take a swing at it.

t/font-type1.t is expecting a .pfb + .pfm combination at one of two Linux paths. If I understand you, it's now a .t1 + .afm for Red Hat and/or Ghostscript? If simply using those two, as you reported you did, gives a couple of warning messages, it sounds like .t1 is not the same as .pfa or .pfb. Also, it sounds like a new default path is needed for T1 font testing. I could think about using the new $pdf->font_path() call (Builder.pm) to consolidate possible T1 paths -- that shouldn't be a big problem, and it will take care of one of the issues (new path). However, I have no idea about this new .t1 font file. I'll be happy to work on it, if there is some clear documentation. Note that a .t1 file will need to work on Windows, as that is the platform I develop on! Otherwise, someone on your end will need to help with the testing.

While we're talking, does a .afm or .pfm file always have the same basename as the font (.pfa, .pfb, .t1) file? Is it common practice to put it in the same directory as the font file, or is it required? Currently, it isn't a big deal, as the full paths need to be explicitly given for both, but if we start using font paths, that could be a potential problem.

apodtele commented 1 year ago

I almost certain that t1 is pfa (FreeType supports them). Yes, they are matched with afm as before, see here. Indeed, the pair is usually found in the same path as before. These are just newer expended versions of the old fonts, created by the same URW++ folks.

That said and reassured, Type 1 support is officially stopped by Adobe. Those fonts will, of course, continue as OTF(CFF) and TTF fonts.

PhilterPaper commented 1 year ago

It appears that Postscript.pm was treating the .t1 file as PFA, but failed to find the $body and $tail portions of a line. It reads the internals of the file, not just look at the filetype. If this is merely a renamed .pfa file, I wouldn't have expected that. So, .t1 may not be same format as .pfa. Why would they have simply renamed .pfa to .t1, if there were no internal changes?

ppisar commented 1 year ago

A T1 font example:

%!PS-AdobeFont-1.0: URWGothic-Book 1.00
%%CreationDate: Wed May 17 2017
% Copyright (URW)++,Copyright 2014 by (URW)++ Design & Development
% (URW)++,Copyright 2014 by (URW)++ Design & Development
10 dict begin
/FontInfo 12 dict dup begin
/version (1.00) readonly def
/Notice ((URW)++,Copyright 2014 by (URW)++ Design & Development) readonly def
/Copyright (Copyright (URW)++,Copyright 2014 by (URW)++ Design & Development) readonly def
/FullName (URW Gothic Book) readonly def
/FamilyName (URW Gothic) readonly def
/Weight (Book) readonly def
/ItalicAngle 0.0 def
/isFixedPitch false def
/UnderlinePosition -106 def
/UnderlineThickness 58 def
end readonly def
/FontName /URWGothic-Book def
/PaintType 0 def
/FontBBox {-144 -260 1151 1019} readonly def
/FontType 1 def
/FontMatrix [0.001 0.0 0.0 0.001 0.0 0.0] readonly def
/Encoding StandardEncoding def
currentdict end
currentfile eexec^M<E9><8D>     <D7>`<A3><C2>,<F1>^Y<F9><DC>i<9A>"

[...]

<89>^M^X<DC>7<D8>P(^A!<86><CF>U5)<F2><9D><FE>a<BF>cD^F<BC><E8>+0000000000000000000000000000000000000000000000000000000000000000^M0000000000000000000000000000000000000000000000000000000000000000^M0000000000000000000000000000000000000000000000000000000000000000^M0000000000000000000000000000000000000000000000000000000000000000^M0000000000000000000000000000000000000000000000000000000000000000^M0000000000000000000000000000000000000000000000000000000000000000^M0000000000000000000000000000000000000000000000000000000000000000^M0000000000000000000000000000000000000000000000000000000000000000^Mcleartomark

As you can see there is no binary header (first byte is not 0x80) is in PFB, but the body (after eexec) is binary comparing to a PFA hexadecimal string. So you need to parse the header as PFA, but the body as PFB.

PhilterPaper commented 1 year ago

OK, from this one sample, it looks like .t1 is the same as .pfa, except that the currentfile eexec section is binary instead of ASCII hex digits. My .pfa also has cleartomark at the end, and your .t1 has a ^M after currentfile eexec. Is that a typo from cleaning the line-ends to append here, or is it significant?

If the binary v. ASCII data is the only real difference, I can try modifying Postscript.pm to handle that. If I don't run into problems, I might even have it done tonight. Also, I'll add the new path(s) to the t/font-type1.t file, so it should work out of the box for you.

Are you holding up PDF::Builder in a Red Hat release for this one issue? I can give it priority if you can hold on for a day or two. Note that it will only be on GitHub once you report that it works, and the next CPAN release is probably at least 3 or 4 months off.

What should I be doing about Adobe dropping T1 support? I would think that there would be lots of PDFs out there using T1 fonts that will need to be displayed. In my documentation, should I encourage PDF::Builder users to move away from $pdf->psfont() (to TTF or core), but not drop support any time soon?

apodtele commented 1 year ago

Sooner or later you will have to follow Adobe... "As announced in January 2020, support for all Type 1 fonts in Adobe products will stop by January 2023. Users will no longer have the ability to author content using Type 1 fonts after that time."

Warn users to stop using it now and error on T1 in a year or two.

PhilterPaper commented 1 year ago

OK, I will add a warning to the POD that Type1 support (psfont) is going away Real Soon Now. Is there any point in continuing to add .t1 support if the biggest Reader is going to ignore it, or will Readers (possibly just non-Adobe) keep supporting it for a while? Their wording is "...no longer have the ability to author content...", [my emphasis] which seems to contradict "...support for all Type 1 fonts in Adobe products will stop..." which implies Readers won't handle it, either. Of course, who knows what non-Adobe products will do.

Regarding my previous, I didn't scroll right far enough to see the cleartomark. I do see ^M's in the binary stream -- are these actually part of the data, or noise? .pfb seems to have one long binary stream... what does .t1 do (are they broken up into lines)?

apodtele commented 1 year ago

Adobe explains: "Type 1 data embedded in file types such as EPS and PDF will be unaffected by this change, as long as they are placed for display or printing as graphic elements. If those files are opened for editing in applications such as Illustrator or Photoshop, they will trigger a “Missing fonts” error."

This sounds like read-only for embedded fonts going forward. It is your call if you want to support the t1 fonts.

The original RedHat problem is the dependency on the system pfb fonts for optional testing. You could just drop that in light of Type 1 future.

PhilterPaper commented 1 year ago

Service while you wait...

I can't test with .t1 files, so would you please try it? The t-test and examples don't blow up on Windows, anyway. You'll need to strip off the .txt suffix. I don't know if I need to trim off (chomp?) ^M's in the binary data section. I may need to add that.

Note that the code in Postscript.pm examines the inside of the file to distinguish pfb from pfa/t1, but now relies on the file name suffix (extension) being .t1 or .pfa (case-insensitive) to distinguish t1 from pfa.

Changes.txt Postscript.pm.txt Docs.pm.txt font-type1.t.txt

If you give me a path update for the example (tools/3_examples.pl, for examples/021_psfonts), I can update that.

ppisar commented 1 year ago

The ^M sequences mean 0x0d bytes (carriage returns). 0x0a bytes (new lines) are presented as new lines. I copied an output of "less" tool run on https://github.com/ArtifexSoftware/urw-base35-fonts/raw/master/fonts/URWGothic-Book.t1 file.

This issue does not block me from building PDF-Builder for RHEL 9. I resorted to depending on an unrelated package which happens to bundle the Type 1 font in PFB format. However, the dependency is undesired. Another option for me is simply drop the dependency and keep skipping the test. So this is not an urgent problem for me. Merely a notice for you that there is a problem which other people will meet in the future.

Whether you want to keep supporting Type 1 is up to you. I don't have any personal interest in it. I can enjoy cubic Bezier curves with OpenType fonts.

PhilterPaper commented 1 year ago

Your .t1 file almost works. I think I have to get rid of the ^M's in the binary section, without corrupting the binary data. Also, the t-test needs an update to use afmfile=> instead of pfmfile=>.

I'm in this deep with Type 1 support, so it doesn't hurt to keep it in for the indefinite future. Users are warned in the documentation (Docs.pm) that Reader support may be going away (it's not clear to me whether they mean only editors such as Acrobat, or Readers too will not handle Type 1).

apodtele commented 1 year ago

As a part of PDF standards, Adobe can hardly refuse displaying old PDF files with T1 fonts. Already, major browsers will happily pick up this slack, while Adobe will further lose the Reader market share. Refusing to edit or create T1-bases PDFs is not the same as breaking the standards.

It is totally up to you if you want to continue creating and editing T1-based files. This bug only shows that these fonts are old and poorly supported: old pfb's are replaced with t1's, but also with otf/cff's and ttf's. Do you want to drag your feet or move on?

P.S. I have bluebook.pdf circa 1994. You probably know it too. I bet it will always work with the Reader.

ppisar commented 1 year ago

I tried your new code and unfortunately it still print the same warnings I originally reported:

$ perl -Ilib t/font-type1.t 
1..2
Use of uninitialized value $body in concatenation (.) or string at lib/PDF/Builder/Resource/Font/Postscript.pm line 170, <$inf> line 386.
Use of uninitialized value $tail in concatenation (.) or string at lib/PDF/Builder/Resource/Font/Postscript.pm line 170, <$inf> line 386.
ok 1 - Able to read a count of glyphs (>0) from a Type1 font
ok 2 - Font has the expected name

Maybe you forgot comment out the other font paths and the test exhibits first font it finds.

Regarding font paths in examples, I wouldn't touch them. The paths wary among systems and it would only complicated the examples. Regarding

PhilterPaper commented 1 year ago

Well, my point was that all this time and effort has already been sunk into supporting Type 1 fonts (.pfa, .pfb, .t1; .afm, .pfm), primarily by my predecessors. Ongoing support shouldn't cost me very much (extending to handle .t1 is the first major issue I recall having with it). I'm not dragging my feet or being obstinate to continue to support Type 1, but users are advised that the larger world is moving away from Type 1, and they should consider TTF/OTF instead (future support by PDF Readers is a legitimate issue). 'CJK' and 'BDF' are two other formats still supported by PDF::Builder, although largely unused. It costs very little to maintain them, and they might be useful for someone.

.t1 support isn't quite done yet -- I'm nearly there and I hope to complete it today.

PhilterPaper commented 1 year ago

I think I've got it working now. Please give it a try and let me know if it is satisfactory: Postscript.pm.txt PDF::Builder::Resource::Font::Postscript font-type1.t.txt t/ 021_psfonts.txt examples/ (Need to strip off .txt suffix)

ppisar commented 1 year ago

Amazing. It works out of the box now. All tests pass, examples/021_psfonts produces a correct glyph set.

PhilterPaper commented 1 year ago

That's good to hear. If you regard it as now working correctly for your purposes, please close this ticket and I will push the new code to GitHub. Otherwise, any updates needed to font paths, etc? If you need a few days to play with it, no problem,

apodtele commented 1 year ago

Cool. I was looking at 021_psfonts.txt. You say that these fonts are not compatible with UTF8. I wonder why. The t1 fonts have pretty decent coverage with full sets of Cyrillic and Greek scripts.

ppisar commented 1 year ago

I tested it in RHEL 9 and Fedora 38. The font paths are sufficient.

PhilterPaper commented 1 year ago

OK, into GitHub it goes! The next CPAN release probably won't be for a few months.

According to PDF, it is single byte encoding only. No multibyte such as UTF-8. Same for core fonts. There's no reason you can't use these fonts for non-PDF purposes (including converting them to TTF). I am kicking around some ideas for supporting UTF-8 in PDF via multiple subfonts on a page: initialize a subfont to ASCII, add additional 128 characters as encountered. Rinse and repeat, changing to the new subfont.