dompdf / php-font-lib

A library to read, parse, export and make subsets of different types of font files.
GNU Lesser General Public License v2.1
1.73k stars 256 forks source link

Why use UTF16ToUTF8() ? #70

Closed git-host-admin closed 6 months ago

git-host-admin commented 6 years ago

Hi:

I'm from china, and there many chinese fonts. When i use getFontName() or other function like this, the return value is not valid, but if i remove the UTF16ToUTF8() call, it's the thing we want.

issue

Issue.zip

C4MS commented 4 years ago

Library assumes all encoding are UTF16 by default without taking in consideration the PlatformID provided

https://github.com/opentypejs/opentype.js/blob/342fac9e81a34ef08b69c5f08a0ec71727e0b832/src/tables/name.js#L644

I have overcome this issue by subclassing the name table class and overriding _parse() function

$font = \FontLib\Font::load($path);

//Replace old table
$tables = $font->getTable();
$table = new \Additions\Table\Type\nameEncoding($tables['name']);
$table->parse();
$font->setTableObject('name', $table);

$font->getFontPostscriptName(); 
class nameEncoding extends name {
   private static $header_format = array(
        "format"       => self::uint16,
        "count"        => self::uint16,
        "stringOffset" => self::uint16,
    );

    protected function _parse() {
        // override here
    }
}
bsweeney commented 7 months ago

The global conversion from UTF16 was, mostly, according the spec.

Relating to platform ID 0 (Unicode):

All Unicode-based names must be in UTF-16BE (big-endian, two-byte encoding). UTF-8 and UTF-32 (one- and four-byte encodings) are not allowed.

Relating to platform ID 3 (Windows):

Encoding IDs for platform 3 'name' entries should match the encoding IDs used for platform 3 subtables in the 'cmap' table. When building a Unicode font for Windows, the platform ID should be 3 and the encoding ID should be 1. When building a symbol font for Windows, the platform ID should be 3 and the encoding ID should be 0. All string data for platform 3 must be encoded in UTF-16BE.

However, it is also true that other encodings may be used (as seen in the supplied font). While I haven't completely addressed the underlying deficiency in how the library handles string encoding, the changes implemented for the next release should be sufficient for most cases. Expanded encoding support will be built out as needed based on user feedback.

bsweeney commented 7 months ago

I noticed that the sample font provided uses cmap subtable format 2, which isn't yet supported. I added support for that format and improved encoding support in other areas of the library so that the next release will correctly re-encode this font.

The re-encoded font now loads correctly in browsers that do not load the original font due to spec compliance issues.