Open Transfusion opened 3 years ago
Unsimplified canonical Japanese variants are mostly available here https://github.com/cjkvi/cjkvi-variants/blob/e4f1da248c9737a243f9930b5dc497cef5d5ae16/jp-old-style.txt#L64-L69
Korean variants of the same nature are taken from the 1800 Hanja for Everyday Use
I consider variants of this nature (along with simplified / traditional chinese / the numerals / shinjitai in joyo kanji, radicals, etc) to be orthographic variants to ensure they are grouped together https://github.com/Transfusion/cjk-radical-search/blob/19d0d1b672d7a652bfcd6cc784dcd43ce7c669e1/etl/variants-fetcher.ts#L109
TODO: investigate the 1800 korean hanja list and check whether any of them are not in the commonly used traditional chinese set, as I do not include them when computing orthographic variants, rather only in the expandVariantIslands
function (TBD: discussion on what this does and the design issues faced)
卫、衛、衞󠄀
Note that in Japan, https://www.kanjipedia.jp/kanji/0000403800 衞󠄀 is the 旧字 of 衛 (!!)
One cannot go hunting in the Unihan database directly since they are preexisting variants in G sources too - https://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=U%2B885E
「說文解字」has https://dict.variants.moe.edu.tw/variants/rbt/word_attribute.rbt?quote_code=QTAyNzY4 眞, and furthermore goes on to say: 僊人變形而登天也。从匕从目从乚。 Korea and Japan consider this variant to be canonically traditional.
The case of 既 and 即 is strange in Japanese: they are 既 and 卽 respectively.