eisoch / irg

irg
28 stars 1 forks source link

RS issues #17

Open eisoch opened 6 years ago

eisoch commented 6 years ago
UCS Block Char Ref. Current RS Correct RS Variant Source
U+3B3A ExtA K3-275E 74.10 74.9 IRGN2239
U+2575E ExtB 𥝞 TF-2842 115.3 59.5 IRGN2269EisoReview,異體字字典A01271-004
U+2A741 ExtC 𪝁 K5-005E 9.7 30.6
U+2A80F ExtC 𪠏 GZFY-00829 27.9 107.6 𥀬 广州话词典_P55
U+2B0D7 ExtC 𫃗 GZFY-01010 119.16 178'.18 𩏷
U+2B180 ExtC 𫆀 TC-3558 128.2 26.6 IRGN1232P1
U+2B385 ExtC 𫎅 TC-3623 152.1 1.7,7.6 亟,焏 IRGN1232P1,佛教難字字典_P7
U+2C1AE ExtE 𬆮 UTC-00068 79.11 196'.10
U+2D495 ExtF 𭒕 USAT-60078 38.11 30.11

(To be continued....)

hfhchan commented 6 years ago
UCS Block Char Ref. CurrentRS CorrectRS Variant Source
U+2A741 ExtC 𪝁 K5-005E 9.7 30.6
U+2B180 ExtC 𫆀 TC-3558 128.2 26.6 IRGN1232P1
U+2B385 ExtC 𫎅 TC-3623 152.1 1.7,7.6 亟,焏 IRGN1232P1,佛教難字字典_P7
hfhchan commented 6 years ago

To investigate: image image

eisoch commented 6 years ago

3977和22283按康熙归部

hfhchan commented 6 years ago

頂...

Mastameta commented 3 years ago
UCS Block Char Ref. CurrentRS CorrectRS Variant Comment
U+5954 BMP ALL 37.6 37.5 FA7F Current RS refers to the FA7F Kangxi glyph, not the reference glyphs.
U+5ED9 BMP ALL 53.12 53.11 FA83 Current RS refers to the FA83 Kangxi glyph, not the reference glyphs.
U+6452 BMP J1-405C, K2-365A 64.9 64.9; 64.11 FA8F J1-405C and K2-365A represent the 'normative' Kangxi glyph, which is 11 strokes. However, the G, H, T source glyphs are 9 strokes.
hfhchan commented 3 years ago

Note: stroke count for old characters follow the Kangxi count, and new characters follow the IRG stroke count rules. The treatment of existing characters is currently undefined (but a solution should be sought out).

Mastameta commented 3 years ago

by 'old' characters, you mean the Unicode 1.1 BMP set? what does the 'IRG stroke count rules' refer to? (sorry, I am a newcomer to this documentation.)

as far as I know, 5954 and 5ED9 have never had earlier reference glyphs (in any region) that correspond to the Kangxi stroke count.

note that 摒 6452 is listed as 9 strokes (which is correct for G, H, T); but the Kangxi stroke count is actually 11 (J, K). so, if RS is supposed to list the Kangxi stroke count for old characters, then UniHan is inconsistent on this issue. image

hfhchan commented 3 years ago

New Extensions to the CJK Unified Ideographs are handled by IRG, a subgroup of ISO/IEC JTC1/SC2/WG2. You may refer to the IRG PnP which specifies the documents to refer to for stroke count.

For old characters (URO to Extension F), supposedly they follow the rule in The Unicode Standard which will use the stroke count in Kangxi Dictionary as priority, then that from the Morahashi dictionary, then Hanyu Dazidian, then from a specific Korean dictionary, even if none of the representative glyphs use the Kangxi shape.

Unfortunately Kangxi is not consistent in its stroke count methodology.

hfhchan commented 3 years ago

I think Unihan has some work in progress which intends to change the field to reflect the actual stroke count. But I'm not of its status.

Mastameta commented 3 years ago

New Extensions to the CJK Unified Ideographs are handled by IRG, a subgroup of ISO/IEC JTC1/SC2/WG2. You may refer to the IRG PnP which specifies the documents to refer to for stroke count.

For old characters (URO to Extension F), supposedly they follow the rule in The Unicode Standard which will use the stroke count in Kangxi Dictionary as priority, then that from the Morahashi dictionary, then Hanyu Dazidian, then from a specific Korean dictionary, even if none of the representative glyphs use the Kangxi shape.

Unfortunately Kangxi is not consistent in its stroke count methodology.

Thank you for the pointers. The only IRG document I've looked at in detail is IRG N2107R2, which is about the UK glyphs for ExtG.