dompdf / php-font-lib

A library to read, parse, export and make subsets of different types of font files.
GNU Lesser General Public License v2.1
1.73k stars 256 forks source link

Add platformID/platformSpecificID/languageID support to getFont* #139

Open lslqtz opened 4 months ago

lslqtz commented 4 months ago

Adding this functionality would make it possible to obtain data for different platforms or languages, and might also prevent conflicts for the same nameID.

This is a font for testing: 方正兰亭圆_GBK_准.ttf.zip

% font info 1.ttf
Mac (1,0,0,1) Font Family: FZLanTingYuan-M-GBK
Mac (1,0,0,2) Font Subfamily: Regular
Mac (1,0,0,3) Unique Identifier: Founder:FZLanTingYuan-M-GBK    Regular
Mac (1,0,0,4) Full Name: FZLanTingYuan-M-GBK
Mac (1,0,0,5) Version: 1.00
Mac (1,0,0,6) PostScript Name: FZLANTY_ZHUNK--GBK1-0
Mac (1,25,33,1) Font Family: ??????ͤԲ_GBK_׼
Mac (1,25,33,2) Font Subfamily: Regular
Mac (1,25,33,3) Unique Identifier: Founder:??????ͤԲ_GBK_׼   Regular
Mac (1,25,33,4) Full Name: ??????ͤԲ_GBK_׼
Mac (1,25,33,5) Version: 1.00
Mac (1,25,33,6) PostScript Name: FZLANTY_ZHUNK--GBK1-0
Microsoft (3,1,1033,1) Font Family: FZLanTingYuan-M-GBK
Microsoft (3,1,1033,2) Font Subfamily: Regular
Microsoft (3,1,1033,3) Unique Identifier: Founder:FZLanTingYuan-M-GBK   Regular
Microsoft (3,1,1033,4) Full Name: FZLanTingYuan-M-GBK
Microsoft (3,1,1033,5) Version: 1.00
Microsoft (3,1,1033,6) PostScript Name: FZLANTY_ZHUNK--GBK1-0
Microsoft (3,1,2052,1) Font Family: 方正兰亭圆_GBK_准
Microsoft (3,1,2052,2) Font Subfamily: Regular
Microsoft (3,1,2052,3) Unique Identifier: Founder:方正兰亭圆_GBK_准   Regular
Microsoft (3,1,2052,4) Full Name: 方正兰亭圆_GBK_准
Microsoft (3,1,2052,5) Version: 1.00
Microsoft (3,1,2052,6) PostScript Name: FZLANTY_ZHUNK--GBK1-0

For example: do a simple test on the original version

% cat 1.php
<?php
require_once('vendor/autoload.php');

function GetMatchedFontInfo(string $fontfile): null|FontLib\TrueType\File|FontLib\TrueType\Collection {
    $fontInfo = FontLib\Font::load(($fontfile));
        try {
            $fontInfo->parse();
            return $fontInfo;
        } catch (Throwable $e) {
        }
        try {
            $fontInfo->close();
        } catch (Throwable $e) {
        }
    return null;
}

function GetSubsetFont(?FontLib\TrueType\File $fontInfo, string $targetFile) {
    if ($fontInfo === null) {
        return;
    }
    $fontInfo->setSubset('test');
    if ($fontInfo->open($targetFile, FontLib\BinaryStream::modeReadWrite)) {
        $fontInfo->encode(array("OS/2"));
    }
}
$font = GetMatchedFontInfo('1.ttf');
$data = $font->getFontName(); // NAME_NAME (ttf): 1
var_dump($data);
% php 1.php
string(23) "方正兰亭圆_GBK_准"

Mac (1,0,0,1) Font Family: FZLanTingYuan-M-GBK Microsoft (3,1,1033,1) Font Family: FZLanTingYuan-M-GBK

You'll find that it may not match the output you expected for the language or platform. The essential reason for this problem is that the name class is able to parse but does not set enough information when parse.

Therefore, this PR adds a way for users to obtain information on any platform or language. The trade-off is that if the user doesn't know the type they want to deal with, they have to guess to get the font information. So, this is where optimization is needed. (For example assume the default value 3,1,1033?)

Use PR's version (omitting duplicate code):

$font = GetMatchedFontInfo('1.ttf');
$data = $font->getFontName(3, 1, 1033);
var_dump($data);
% php 1.php
string(19) "FZLanTingYuan-M-GBK"

Edit: reduce is not work, because I haven't modified it yet.

lslqtz commented 4 months ago

EOT does not have the same modifications, which may cause problems. In addition, since support for the mac platform type has been added, we also try to use a simple method to perform mac encoding conversion. (The mac platform has so many specific IDs that it is impossible to accurately determine. https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6name.html)

lslqtz commented 4 months ago

Also add setData function to facilitate users to modify font information. In my use, it was to modify the font name and identifier for subset.

Example:

$font = GetMatchedFontInfo('1.ttf');
$data = $font->getData("name", "records");
$data["3,1,2052,2"]->string = 'Regular?';
$font->setData("name", "records", $data);
GetSubsetFont($font, '2.ttf');
% php 1.php
% font info 2.ttf
Microsoft (3,1,2052,2) Font Subfamily: Regular?
lslqtz commented 4 months ago

Well, in the sense that I mentioned before, the definition changed, but the user usage and test unit did not, which is the issue to discuss because it breaks backward compatibility. (Define a widely used default value, or select one randomly/with certain rules)

bsweeney commented 4 months ago

For sure I'll probably have some thoughts on what we can do here. I'll let you know after I've had a chance to review.

lslqtz commented 4 months ago

Also add setData function to facilitate users to modify font information. In my use, it was to modify the font name and identifier for subset.

Example:

$font = GetMatchedFontInfo('1.ttf');
$data = $font->getData("name", "records");
$data["3,1,2052,2"]->string = 'Regular?';
$font->setData("name", "records", $data);
GetSubsetFont($font, '2.ttf');
% php 1.php
% font info 2.ttf
Microsoft (3,1,2052,2) Font Subfamily: Regular?

I discovered a problem: I tried to cache the fonts to process again on subsequent calls, but I got an unpack error. For this I tried to copy it using clone but it doesn't work. After observation, I found that it seems to modify the file, and the subsequently opened file will replace the previously loaded file, which may not be good for continuous subset and setData. Loading font files repeatedly may reduce performance.

Additionally, this may cause a problem: the pointer to the load file is never fclose.

I added a workaround to solve this problem: when the user calls open again, we keep the pointer to the previous file. This way, the user can close the new file and return to the previous file by calling the revert method. At the same time, users can also close two files at the same time directly through the close method.

After testing, without modifying the font information, continuous use of setsubset will not affect the generated glyphs. I don't quite understand the principle of this font library.

Edit: The original method of assuming encoding based on platform-specific IDs will cause garbled characters in the name of some fonts, at least in this PR. The solution is: use the detected type as the first priority instead of specifying it. Crccgbqb_0.TTC.zip