Closed sohobloo closed 2 years ago
Interesting. Looks like gb2312
is an alias for gbk
, which is a superset of gb2312
: https://en.wikipedia.org/wiki/GBK_(character_encoding). Feel free to open a PR to change it.
Interesting. Looks like
gb2312
is an alias forgbk
, which is a superset ofgb2312
: https://en.wikipedia.org/wiki/GBK_(character_encoding). Feel free to open a PR to change it.
OK, I'll try all the encodings listed in this file and find out the compatible encoding for TextDecoder
.
I did some researches and correct encodings that I'm sure.
https://github.com/foliojs/fontkit/pull/285
https://docs.microsoft.com/en-us/typography/opentype/spec/cmap
https://docs.microsoft.com/en-us/typography/opentype/spec/name
https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6name.html
https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder/encoding
https://www.w3.org/International/docs/encoding
https://encoding.spec.whatwg.org/
http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/
Platform ID | Encoding ID | Description (Apple) |
Fontkit 2.0.2 encoding | TextDecoder.encoding (W3) |
Remarks |
---|---|---|---|---|---|
0 (Unicode) |
0 | Unicode 1.0 semantics—deprecated | utf16be | utf-16be | |
1 | Unicode 1.1 semantics—deprecated | utf16be | utf-16be | ||
2 | ISO/IEC 10646 semantics—deprecated | utf16be | utf-16be | ||
3 | Unicode 2.0 and onwards semantics, Unicode BMP only | utf16be | utf-16be | ||
4 | Unicode 2.0 and onwards semantics, Unicode full repertoire | utf16be | utf-16be | ||
5 | Unicode Variation Sequences—for use with subtable format 14 | utf16be | utf-16be | ||
6 | Unicode full repertoire—for use with subtable format 13 | NOT FOUND | utf-16be | ||
1 (Macintosh) |
0 | Roman | x-mac-roman | x-mac-roman / macintosh | |
1 | Japanese | shift-jis | shift-jis / shift_jis | ||
2 | Chinese (Traditional) | big5 | big5 | ||
3 | Korean | euc-kr | euc-kr | ||
4 | Arabic | iso-8859-6 | iso-8859-6 | ||
5 | Hebrew | iso-8859-8 | iso-8859-8 | ||
6 | Greek | x-mac-greek |
UNSURE | (mapping) Seems iso-8859-7 is for Greek |
|
7 | Russian | x-mac-cyrillic | x-mac-cyrillic | ||
8 | RSymbol | x-mac-symbol | UNSURE | ||
9 | Devanagari | x-mac-devanagari | UNSURE | IS 13194:1991 (ISCII-91) x-iscii-de |
|
10 | Gurmukhi | x-mac-gurmukhi | UNSURE | IS 13194:1991 (ISCII-91) | |
11 | Gujarati | x-mac-gujarati | UNSURE | x-iscii-gu IS 13194:1991 (ISCII-91) |
|
12 | Oriya | Oriya | UNSURE | ||
13 | Bengali | Bengali | UNSURE | ||
14 | Tamil | Tamil | UNSURE | ||
15 | Telugu | Telugu | UNSURE | ||
16 | Kannada | Kannada | UNSURE | ||
17 | Malayalam | Malayalam | UNSURE | ||
18 | Sinhalese | Sinhalese | UNSURE | ||
19 | Burmese | Burmese | UNSURE | ||
20 | Khmer | Khmer | UNSURE | ||
21 | Thai | iso-8859-11 | iso-8859-11 / window-874 | tis-620 | |
22 | Laotian | Laotian | UNSURE | ||
23 | Georgian | Georgian | UNSURE | ||
24 | Armenian | Armenian | UNSURE | ||
25 | Chinese (Simplified) | hz-gb-2312 | gbk / gb2312 | euc-cn (gb2312) hz-gb-2312 lable for TextDecoder is markd as replacement! |
|
26 | Tibetan | Tibetan | UNSURE | Tibetan | |
27 | Mongolian | Mongolian | UNSURE | ||
28 | Geez | Geez | UNSURE | Inuit there is an x-mac-inuit mapping in Fontkit. |
|
29 | Slavic | x-mac-ce | UNSURE | (mapping) | |
30 | Vietnamese | Vietnamese | UNSURE | ||
31 | Sindhi | Sindhi | |||
32 | Uninterpreted | NOT FOUND | UNSURE | ||
2 (ISO) |
0 | 7-bit ASCII | ascii | ascii | |
1 | ISO 10646 | NOT FOUND | UNSURE | ||
2 | ISO 8859-1 | NOT FOUND | iso-8859-1 / ascii / windows-1252 | ||
3 (Windows) |
0 | Symbol | symbol | UNSURE | |
1 | Unicode BMP | utf16be | utf-16be | ||
2 | ShiftJIS | shift-jis | shift-jis / shift_jis | ||
3 | PRC | gb18030 | gb18030 | ||
4 | Big5 | big5 | big5 | ||
5 | Wansung | x-cp20949 | euc-kr | KS X 1001 | |
6 | Johab | johab | UNSURE | Windows Code Page is 1361. the only available korean encoding in TextDecoder is euc-kr? |
|
7 | Reserved | null | |||
8 | Reserved | null | |||
9 | Reserved | null | |||
10 | Unicode full repertoire | utf16be | utf-16be | ||
Since fontkit dropped iconv-lite, the encoding of
hz-gb-2312
no longer work. the last one in these line: https://github.com/foliojs/fontkit/blob/master/src/encodings.js#L93It's strange that the w3c standard indeed has this encoding and web api has it too. But this code raises an error "The "hz-gb-2312" encoding is not supported" for me:
instead acrossing my experience, I changed
hz-gb-2312
intogb2312
and it works as expect. I don't know if there are other encodings have similar problems.